There have been a few interesting recent news stories concerning the benefits and dangers of Big Data, for businesses and individuals alike. One even points out a possible middle ground, which can allow the continued use of the vast amounts of data at the disposal of government and businesses, while protecting individual privacy.
The benefits of Big Data are not as well-known as they should be. A recent study by Sean Young, assistant professor of family medicine at the David Geffen School of Medicine at UCLA and co-director of the Center for Digital Behavior at UCLA, showed one way that Big Data could be used to promote and protect public health. The researchers collected approximately 550 million Tweets; developed an algorithm, or set of instructions, that searched for words suggesting risky behavior or drug use; and located those words among the Tweets. Though they only identified just less than 10,000 such Tweets, they were able to match those Tweets with geographic areas with unusually high incidences of HIV cases. The researchers propose using real-time analysis of social media data to understand and maybe even predict where HIV and drug use will occur. That information could be used for disease detection and prevention.
The downside to collecting vast amounts of data about large numbers of people is that it is hard control who has access to it and how it is used. According to the British newspaper The Guardian, a management consulting firm recently uploaded the British National Health Service’s Hospital Episode Statistics to Google servers in order to work with the information to answer specific questions and even create interactive maps involving specific queries. It was a large amount of data; it took two weeks to upload and consisted of 27 DVD’s of information. The problem, reportedly, was that the data contained personal information including information about patient locations, since it was used to create maps, and the Google servers were outside Britain, evidently making its dissemination harder to control. This has increased criticism of another NHS plan, the care.data scheme, which will link general physician and hospital records, including a patient’s date of birth, NHS number, zip code, ethnicity and gender, and allow that information to be used by researchers, drug companies, and insurers. The problem, reportedly, is how to safeguard that data, which will be partially, but not totally, scrubbed of personal information.