"Angry tweeting 'could increase your risk of heart disease','' is the poorly reported headline in The Daily Telegraph. The study it reports on found there is a link between angry tweets and levels of heart disease deaths.
Researchers were interested in investigating how various forms of negative psychological stress are linked to heart disease. They looked at how angry tweets, at a community level, may be a reflection of this stress.
For example, people living in an area with a high crime rate and high unemployment may be more likely to vent their anger on Twitter than people living in luxury flats in Mayfair.
And stress and other negative psychological emotions could increase the risk of heart disease.
The study looked at 148 million tweets across US counties and linked them to information on heart disease deaths, as well as demographic risk factors such as age and ethnicity.
Inputting this information into a mathematical model allowed the researchers to broadly predict death rates from heart disease using only the language analysis of Twitter posts, such as looking for swear words.
From a research point of view, this is exciting as it is a new avenue for gathering health insights, which in turn could ultimately help us target health resources at areas that need them most. It would be interesting to see if a UK-based study yielded similar results.
The study was carried out by researchers from the University of Pennsylvania.
It was funded by The Robert Wood Johnson Foundation's Pioneer Portfolio through an Exploring Concepts of Positive Health Grant, and a grant from the Templeton Religion Trust.
The study was published in the peer-reviewed Psychological Science.
The Daily Telegraph's headline that, "Angry tweeting could increase your risk of heart disease" is not correct. The study was about how existing psychological stress is linked to heart disease, and angry tweets may be a reflection of this stress.
A more accurate (if a little lengthy) headline would be: "Stress and other negative psychological emotions increase risk of heart disease, and these people are more likely to send angry tweets".
Despite the misleading headline, the rest of the article was accurate. It ran useful quotes from experts explaining how language patterns can reflect negative emotions such as stress, and this in turn is linked to poorer health, particularly heart health.
"Psychological states have long been thought to have an effect on coronary heart disease. For example, hostility and depression have been linked with heart disease at the individual level through biological effects […].
"But negative emotions can also trigger behavioural and social responses; you are also more likely to drink, eat poorly and be isolated from other people, which can indirectly lead to heart disease."
This was a cross-sectional study looking at whether the language used on Twitter across a range of US counties was a good predictor of underlying psychological characteristics and death rates from heart disease.
Heart disease is the leading cause of death worldwide. Identifying and addressing key risk factors for heart disease, such as smoking, hypertension, obesity and physical inactivity, has significantly reduced this risk, the researchers state.
Psychological characteristics, such as depression and chronic stress, have also been shown to increase risk through physiological effects.
Like individuals, communities have characteristics, such as cultural norms (beliefs about how members of a community should behave), social connectedness, perceived safety and environmental stress, that contribute to health and disease.
One challenge of addressing community-level psychological characteristics is the difficulty of assessment. Traditional approaches using phone surveys and household visits are costly and have limited precision.
The study team thought Twitter might provide a more cost-effective assessment of community-level psychology, which is linked to death and disease.
Previous studies based on user-generated content, such as using Google searches to predict the likely spread of flu, have proved successful.
The researchers gathered 148 million tweets geographically linked to 1,347 counties in the US. It was reported more than 88% of the US population lives in the counties included.
The team then gathered country-level information on heart disease (coronary heart disease) and death, as well as a range of demographic and health risk factor information, such as average income and proportion of married residents.
In 2009 and 2010, Twitter made a 10% random sample of tweets (a data-mining initiative titled the "Garden Hose") available for researchers through direct access to its servers. This was how the researchers accessed the tweets.
The language analysis automatically calculated how often words and phrases were used on Twitter for each county, such as "hate" or "jealous", and categorised them according to theme.
They also searched for swear words we couldn't possibly repeat to a PG audience. Themes included anger, anxiety, positive and negative emotions, engagement, and disengagement.
Because words can have multiple senses, act as multiple parts of speech, and be used ironically, the researchers manually checked a sample of the automatically generated themes to ensure they were accurate.
All the information was fed into a statistical model to see if it was possible to predict heart disease death rates from the language used on Twitter alone.
Greater use of anger, negative relationship, negative emotion, and disengagement words on Twitter was significantly correlated with greater age-adjusted heart disease mortality. Protective factors included positive emotions and psychological engagement.
Most correlations remained significant after controlling for income and education.
The statistical model – based only on Twitter language – predicted heart disease deaths significantly better than a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity.
The researchers reached a simple conclusion: "Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level."
This study shows it is possible to broadly predict death rates from heart disease at a US county level using language analysis of Twitter posts from those US counties.
From a research point of view, this study is exciting as it gives an extra way of gathering information that could ultimately help target health resources in areas that need it most.
The cost effectiveness of this type of psychological insight would be interesting to weigh against existing methods such as telephone interviews.
But this was just a single study, so we cannot be sure this technology is practical or useful in a wide range of applications. This would depend on how speech is related to other health risk factors.
Nonetheless, this is an interesting avenue for further investigation. The research community is always looking for new cost-effective methods of gathering data to improve people's health.
This study suggests language analysis of Twitter, in some circumstances, might be a useful activity. This could potentially be used to assess a wide range of issues, such as depression rates, the prevalence of eating disorders, and levels of alcohol or drug misuse in a given community.
It will be interesting to see where this avenue of research, based on user-generated content, takes us.