There have been a few interesting recent news stories concerning the benefits and dangers of Big Data, for businesses and individuals alike. One even points out a possible middle ground, which can allow the continued use of the vast amounts of data at the disposal of government and businesses, while protecting individual privacy.
The benefits of Big Data are not as well-known as they should be. A recent study by Sean Young, assistant professor of family medicine at the David Geffen School of Medicine at UCLA and co-director of the Center for Digital Behavior at UCLA, showed one way that Big Data could be used to promote and protect public health. The researchers collected approximately 550 million Tweets; developed an algorithm, or set of instructions, that searched for words suggesting risky behavior or drug use; and located those words among the Tweets. Though they only identified just less than 10,000 such Tweets, they were able to match those Tweets with geographic areas with unusually high incidences of HIV cases. The researchers propose using real-time analysis of social media data to understand and maybe even predict where HIV and drug use will occur. That information could be used for disease detection and prevention.
The downside to collecting vast amounts of data about large numbers of people is that it is hard control who has access to it and how it is used. According to the British newspaper The Guardian, a management consulting firm recently uploaded the British National Health Service’s Hospital Episode Statistics to Google servers in order to work with the information to answer specific questions and even create interactive maps involving specific queries. It was a large amount of data; it took two weeks to upload and consisted of 27 DVD’s of information. The problem, reportedly, was that the data contained personal information including information about patient locations, since it was used to create maps, and the Google servers were outside Britain, evidently making its dissemination harder to control. This has increased criticism of another NHS plan, the care.data scheme, which will link general physician and hospital records, including a patient’s date of birth, NHS number, zip code, ethnicity and gender, and allow that information to be used by researchers, drug companies, and insurers. The problem, reportedly, is how to safeguard that data, which will be partially, but not totally, scrubbed of personal information.
There is, however, a middle ground that allows data to be collected, saved, and analyzed, while better protecting privacy. The Defense Advanced Research Projects Agency, or Darpa, according to a recent article in the Wall Street Journal, is working on “fully homomorphic encryption.” According to Dan Kaufman, the head of Darpa’s software-innovation group, with fully homomorphic encryption, data can be encrypted and then searched without being unencrypted. It basically allows you to determine whether a locked piece of data contains certain search terms, but it only tells you the number of matches, without identifying who those matches are. When you get to a sufficiently small number of matches, and with the consent of some independent authority, you could unlock the data and identify the matches. Right now, according to Dan Kaufman, the process is much too slow, though Darpa is trying to speed it up.
When fully homomorphic encryption can be used at sufficiently fast speeds, it should protect privacy better than the current system in which data can be examined only after it has been unlocked. The fewer times the data is unencrypted, the fewer chances the exposed data will be misused or obtained by an unauthorized user. By locking the specific identities of the people involved, it might better unlock the benefits, to businesses, health officials, and individuals alike, of Big Data.
Go raibh maith agat