Skip to main content

How Can Twitter Protect You From The Flu?

We all know what Twitter is used for. The purpose of Twitter is to update. Whether by companies to announce new products, by celebrities to reach out to fans, by news companies to spread information, or by regular people to tell their friends what they’re up to, Twitter is a brilliant web service that keeps people in the loop. For the purposes of the study this article talks about, it will be necessary to focus in more on the realm of Twitter users that are considered “regular people.” In other words, this group consists of individuals, with individual accounts, who use their tweets to update a network of friends who choose to follow them. This group of people is what makes up the vast majority of the Twittersphere and is the group that generates the majority of tweets that get posted online. All the tweets collect on Twitter’s servers and form a massive collection of very interesting data, which Adam Sadilek from The University of Rochester has figured out a way to utilize to track and predict the spread of disease.


Many people who have personal Twitter accounts like to update their networks with seemingly useless information about themselves, about how their day is going, or about what they ate for lunch. This data, from one individual person, doesn’t seem all that interesting, but taken across hundreds, or thousands of people, it can reveal some very useful and fascinating things. Adam Sadilek utilized the power of linguistics software and machine learning technologies to analyze thousands of Tweets over a period of time. In the tweets, his software looked for indications of the user being sick. For instance, keywords like runny, nose, sick, cough, flu, and so on were probably tagged as being relevant to his study. Based on the context of each tweet, Sadilek could associate some kind of marker for each person, and determine whether or not they were sick. While this kind of analysis might not work on one person, for example, who tweets very rarely, or doesn’t use Twitter to talk about how he or she is feeling that day, the analysis happens to be incredibly revealing when sifting through massive data sets with thousands of people.


The really cool part about how Sadilek was able to predict who was, or was going to be sick, is based on concepts we have already discussed in class. Sadilek’s research was able to identify Twitter users as sick or healthy, and then use their social networks to identify other Twitter users who were sick or at risk of getting sick. While the article I read does not state specifically what characteristics were looked for in the relationships between different Twitter users, based on course concepts, I would assume the Sadilek’s algorithms looked for who people tweeted at, re-tweeted, had been in contact with one another, etc. All of these factors would have pointed to labeling social network relationships with strong or weak ties. Of course, in the real world, the concept is not as black and white as we have made it out to be thus far, as people can have all sorts of levels of relationships. I am sure that Sadilek’s research was able to label relationships with varying levels of strength in order to get a better idea of who had more of a chance of interacting with someone who was sick. Based on all this, the research was able to generate a complete graph encompassing New York City, with each node being a Twitter user, and each node having a certain probability of being sick or getting sick. All of this, I assume, was based off of tie strength and network density, or the amount of interconnectedness amongst a group of friends or co-workers, for instance.


To further the research, Sadilek was able to utilize Twitter users who had geolocation services set up, so that he could identify not only the social networks in which people were getting sick, but also the places in which those people were getting sick. On the website that I have included the link to, there is a video that shows the progression of sickness prediction, based on Tweets in New York City over a period of time, using different colors to represent different probabilities of sickness.


This is all very cool stuff, and it appears that the potential of this kind of data mining technology is vast. This article, and the research done by Adam Sadilek at The University of Rochester, is a testament to how much information and knowledge can be extracted and analyzed from seemingly unrelated or useless data.




Leave a Reply

Blogging Calendar

September 2012
« Aug   Oct »