Back in Where the Happy People Are, we talked about an analytics project that used two weeks' worth of Twitter feeds to measure happiness in New York City, organized by geotags. Last week, however, a BBC News article about big-data and city planning brought Hedonometer to our attention.
Hedonometer is a fascinating project, conducted jointly by the University of Vermont and Mitre Corp. Hedonometer maintains a repository of approximately 10,000 words, each ranked on a scale of 1 to 9 for its relative indication of happiness. Each day, it runs a random sampling of some 50 million tweets (about 100 gigabytes of data) through an analytics engine that looks for those words. (For you cloud enthusiasts out there, Hedonometer runs on Amazon's Elastic Compute Cloud and stores its data in the AWS Simple Storage Service. The daily computation takes three hours on 1,500 processors.)
The analytics engine generates a happiness score and a "word shift" analysis, which essentially measures the number and frequency of positive and negative words that appear each day. The result is a happiness metric for each day, along with an explanation of which words contributed to the score. Hedonometer has been plotting these out since September 2008.
If you want to take a break from work, check out the chart on the homepage and drill down into specific years or dates. As you can see from this snapshot of 2013, Valentine's Day, Mother's Day, and Father's Day are high points, while April 15 -- the day of the Boston Marathon attack -- is decidedly low.
It's interesting to note that April 15 in general doesn't deviate much from the center. Does that mean most American Twitter users aren't filing IRS tax returns, or that they're not particularly stressed about it?
Questions like these come from a sense of skepticism about Twitter. Aren't most Twitter users teens? How trustworthy are tweets, anyway? But they're addressed in the site's remarkably candid FAQ:
Tweets represent a non-uniform subsampling of all utterances made by a non-representative subpopulation of all people. However, there are hundreds of millions of people presently using the website to express their activities and interests, and as such it is an important social signal.
The FAQ explains that Hedonometer uses Twitter for four reasons:
- Its happiness results correlate with traditional surveys
- Its "garden hose" feature delivers an enormous amount of data that must be processed in real time
- Its metadata features can help analysts drill down into specific communities and geographic locations
- Its role as a "collective, global media voice" makes it an important indicator worthy of analysis
The study of happiness is an inexact science, at best (especially if you ask readers of All Analytics), but the Hedonometer seems to be scientifically rigorous and reasonably transparent, too. The list of keywords is freely available, and the team is working to expand its analytics to phrases, non-English languages, and other data sources.
Members, had you seen Hedonometer before reading this post? What insights jump out as you manipulate the data? Share your thoughts in the comments.