Twitter messages are potential key to tracking disease trends
Contact: Tonya Lowentritt
HAMMOND – Keeping track of disease trends such as influenza outbreaks has the potential to be far quicker and less costly by monitoring a social network program such as Twitter than following the traditional methods of disease surveillance, according to a computer science expert at Southeastern Louisiana University.
A process called syndromic surveillance uses collected health-related data to alert health officials to the probability of an outbreak of disease, typically influenza or other contagious diseases. The technique involves collecting data from hospitals, clinics and other sources, a labor-intensive and time consuming approach. By monitoring a social network such as Twitter, researchers can capture comments from people with the flu who are sending out status messages.
“A micro-blogging service such as Twitter is a promising new data source for Internet-based surveillance because of the volume of messages, their frequency and public availability,” said Aron Culotta, assistant professor of computer science. “This approach is much cheaper and faster than having thousands of hospitals and health care providers fill out forms each week.
“The Centers for Disease Control produces weekly estimates,” he added, “but those reports typically lag a week or two behind. This approach produces estimates daily.”
Culotta and two student assistants analyzed more than 500 million Twitter messages over the eight-month period of August 2009 to May 2010, collected using Twitter’s application programming interface (API). By using a small number of keywords to track rates of influenza-related messages on Twitter, the team was able to forecast future influenza rates.
“Once the program is running, it’s actually neither time consuming nor expensive,” he said. “It’s entirely automated because we’re running software that samples each day’s messages, analyzes them and produces an estimate of the current proportion of people with the flu.”
Southeastern’s group obtained a 95 percent correlation with the national health statistics collected by the CDC. In addition, the results were comparable to figures collected by Google with its Flu Trends service, which tracks influenza rates by analyzing trends in query terms.
Culotta said the statistics he collected were for the whole country. His future work will look at extracting information from messages that include more location-specific data. This would allow him to more easily segment reporting information by regions. He is also planning a Web site that will display his results in real time, being developed in collaboration with graduate student Matthew Gill and computer science senior Ross Murray.
Culotta said using Twitter has an advantage over Google because the high message and posting frequency of Twitter enables up-to-the minute analysis of an outbreak. Twitter, he said, reports having more than 105 million users posting nearly 65 million messages a day. Approximately 300,000 new users are added daily.
“Despite the fact that Twitter appears targeted to a young demographic, it does in fact have quite a diverse set of users,” he said. “The majority of Twitter’s nearly 10 million unique visitors in February 2009 were over 35 years old, and a nearly equal percentage of users are between the ages 55 and 64 as between 18 and 24.”
Culotta’s research was presented at the 2010 Workshop on Social Media Analytics at the Conference on Knowledge Discovery and Data Mining in Washington, D.C. The work was funded in part by the Louisiana Board of Regents.