Twitter can help predict outbreak of disease

29 January 2017
Twitter, Graf believes, isn't just a platform to read news, engage with like-minded individuals, launch insults or give praise. It's also a far-reaching and revealing digital "petri dish" to study human behavior that may help predict disease outbreaks, like HIV, and inform public health efforts, as several studies and social media experts have shown over the last few years. Those 140 character tweets, and not just the ones about health, have been a treasure chest for epidemiologists, computer scientists, psychologists, and many others to dig through.

Twitter is a compendium of who we are, according H. Andrew Schwartz, PhD, an assistant professor of Computer and Information Science at the University of Pennsylvania and Penn Medicine’s Social Media and Health Innovation Lab. “The language that we write is a representation of our daily lives. It captures aspects of people that are otherwise quite elusive to health researchers, such as the psychological side that’s difficult to get at.”

These aspects that can be missed with traditional – and slow to publish – disease surveillance, studies, and surveys. Tapping into the big data of Twitter has already helped better forecast flu outbreaks and asthma attacks, research has shown.

Runny noses and HIV

Tweets about anything from runny noses to HIV infections also have the potential to spot overall health trends that could inform campaigns and interventions to help treat people with a slew of diseases, Penn researchers discussed in an October 2015 study that included Schwartz and senior author, Raina M. Merchant, MD, MSHP, director of the Social Media and Health Innovation Lab. The less-obvious tweets are giving Penn researchers better insight into health, too.

An analysis over 150 million tweets revealed useful info about HIV in communities across the United States and the people who live in them. Action tweets –with words like “go,” act,” “work,” “engage” – were significantly more likely to be coming from people living in counties with lower HIV rates compared to people living in counties with higher HIV rates, Schwartz and his colleagues reported in the journal AIDS and Behavior last year.

General tweets

These weren’t “I am HIV positive” or “I’m going to get tested” type of tweets, which past studies have tracked. It was more about general goals or representations, “I plan on…” or “I’ll beat you at…,” for example. Using a dictionary of 854 words for their analysis, the team set out to answer whether this type of language corresponded with HIV cases.

In the health space, these action tweets have been linked to increased physical activity, but how it played out in the HIV space was unclear. Being in an active community environment, the authors said, may promote quicker diagnosis and treatment of HIV -- which could eventually help reduce the transmission and prevalence of the disease.

The tweets could have gone different ways: towards more positive health decisions or to risky behavior that may increase a person’s chance of contracting the disease. The former was true in this case, and as the authors described, was critically informative, with the “the potential to be highly transformative for prevention and surveillance practices.”

Estimate HIV risk

At the moment, results of large-scale surveys identifying individual and community risk factors for HIV and other infectious diseases can take years to show up. Having a real-time look at the conversations happening on Twitter could prove to be very advantageous.  Data is often too late to act upon. With Twitter and sophisticated geocoding techniques, public health researchers can potentially estimate the current HIV risk of specific locations and implement preventive measures before the actual outbreak occurs, the authors of the AIDS and Behavior study said.

“This type of data can inform public health interventions,” said Schwartz, also an assistant professor of Computer Science at Stony Brook University. “With this information, you can use a ranking of how action oriented a community is and where you may want to intervene, and start some programs about HIV awareness.”

Big data approach

This big data approach stems beyond HIV. Schwartz and his colleagues study a host of other diseases, such as diabetes, to see what the predictive capabilities and associations of Twitter can reveal. Interestingly, preliminary results show a link between the word “church” on social media and higher rates of diabetes, he said.

It’s all part of the ongoing work with “natural language processing” from the social media lab at Penn, as well as Stony Brook and the University of Illinois – each of which comprises a multidisciplinary team of researchers with backgrounds in medicine, law, computer science, demography, geographic information systems, biostatistics, health policy, communications, marketing, design, behavioral health, and operations management.

“Facebook and Twitter opens up the ability to look across open vocabulary, unanticipated language patterns posted ‘in the wild’,” Schwartz said. “We’re looking at many diseases and what the connections with language are – which ones can we predict well and which ones can we not.”