During a two-week period in April 2012, the team collected some 603,954 tweets that were geo-tagged as originating in the Big Apple. They then applied a sentiment filter that looked for smiley or frowny emoticons to determine whether the tweet was positive or negative. The report is lyrically titled "Sentiment in New York City: A High Resolution Spatial and Temporal View."
The results aren't very surprising. People are happiest around public parks and most aggravated in heavy traffic. They're happier on weekends and sadder when plodding to work on Monday mornings. Those sentiments are not unique to New Yorkers, and, honestly, what else is new?
My first reaction is to lump this sort of report in with early sentiment analysis projects that aimed to parse social media posts and apply complicated language filters to determine whether they were positive or negative.
Several years ago, I was involved tangentially with a consulting firm that claimed to use sentiment analysis to track impressions of Windows Vista among IT bloggers (talk about a no-brainer). I lost faith in the accuracy of the analytical tool when its weekly report included results about contractors who install windows in Chula Vista, Calif.
The field has evolved significantly since then, though, and I think the correlation of NECSI's results to empirical reality makes some of the less-obvious findings worthy of consideration.
For instance, Maspeth Creek, a heavily polluted body of water in Brooklyn, is an area of singularly high negative sentiment. The report points out that city officials are on the record as saying they can't speculate regarding how and whether the malodorous sludge affects the local population. With this kind of spatial and temporal sentiment map, however, they can indeed. I think there are plenty of potential applications for this kind of analytics in terms of municipal management.
That said, there are some highlights of the study that show how Twitter and geolocation data may not be as scientific as we'd like. Hunter College High School shows up as the saddest place in New York City, but that's because the research team collected tweets immediately after spring break.
I have a couple of ideas, actually. Holy Cross is surrounded by numerous churches, a preparatory school, and public school 181. It's possible that upbeat tweets from students and worshippers are showing up as originating in the Holy Cross area.
Similarly, the report shows Penn Station as a uniquely negative area, and that's probably because of frustrated travelers. But Madison Square Garden is directly above the station, and that's where the Knicks play, so I might blame them for the misery of that location.
It's possible that the research team accounted for my theories, but I still think these shifting variables underscore the unreliability of social media analysis as a discipline. More work needs to be done, in my opinion, to correct for the foibles of adolescent tweeters and inaccurate geo-tags.
Members, what do you think?