I remember several years back hosting applications like OddCast on a Website and having computer-generated avatars greet visitors was a cool thing to do. The avatars seemed to follow mouse movements with their fake eyeballs!
For a time, as reported in this Radio Television Digital News Association blog, it seemed revolutionary that avatars could read news stories aloud. But that was mainly presentation stuff, the news stories were, and are still, human curated.
Well, the human part, curation, might come to an end, sooner than we think.
Kris Hammond, one of the founders and CTO of Narrative Science, a company working on technologies to generate narratives from data, thinks in 15 years, 90 percent of the news stories will be computer generated, as he discusses here. There will still be room for human curation, but many of the stories will be almost entirely “automated”:
A computer can write highly localized crime reports, personalized stock portfolio reporting, high school and youth sports stories at scale to provide coverage that was previously impossible and could never be possible in a world of purely human generated content.
I doubt we are anywhere near generating narrative stories from unstructured data. Yet, news story automation already may be here. In early December 2011, a computer-generated news portal called The Wall launched, analyzing and displaying real-time local Twitter trends while automatically clustering the information into news topics.
On closer examination, I found each clustered news story ends up linking to one or more actual newspaper stories written by a human reporter. So, while perhaps human bias is not eliminated, the selection of stories appears to be automatic.
Fast Company writes about something similar to topic clustering being used on Shakespeare’s First Folio through DocuScope, word analysis software developed at Carnegie Mellon University.
The Fast Company piece points out a surprising discovery made using DocuScope. Othello -- despite labeled a tragedy -- turns out to be a comedy. Shakespeare apparently used comedic stylistic cues to intensify the play's tragic aspects. Turns out that Shakespeare’s vocabulary and syntax varied wildly between his comedies, historical plays, and tragedies. In fact, according to a DocuScope insight, the funniest thing Shakespeare wrote was a portion of The Merry Wives of Windsor, while a passage from Richard II was the most serious.
Then again, in our age of “big data,” we can now visualize how our literary expressions differ and evolve over time. Take the Corpus of Contemporary American English, or COCA, comprising 425 million words of text from the past two decades, and compare it with equally large samples drawn from fiction, popular magazines, newspapers, academic texts, and transcripts of spoken English. The New York Times recently wrote how the COCA program detected patterns a human would never have found, such as which past-tense verbs show up more frequently in fiction compared with those showing up in academic prose.
And again, the same technology that analyzes unstructured data and turns it into computer-generated insights also can predict what may happen in the future, in the case of the Recorded Future platform, which is partially funded by the CIA and Google. I was a recent guest of the Recorded Future Webinar on the Future of the World Economy and Alternative Energy in 2012.
Recorded Future view
The Recorded Future looks at 100,000 Web pages an hour, scanning across 50,000 sources -- from Securities and Exchange Commission filings to Twitter comments. As discussed in this New York Times blog, it looks for statements about the future, like notices of an annual meeting or predictions about when a product might be released, and past developments, and then creates a “temporal index” that suggests momentum trends and unusually strong relationships between key players in a timeline in order to generate unusual insights.
The Recorded Future is not alone in generating insights. Companies such as Palantir Technologies attempt to visualize the world’s governmental and financial information, as well. Read this blog, for example, detailing Palantir's analysis of the recent turmoil in the Sudan. It performed the relational, temporal, statistical, geospatial, and social network analysis on more than a dozen open sources of intelligence data to gain a deeper understanding and insights around conflict, and how it might be resolved.
Yet another platform, Quid, aims to discover new opportunities through a “white space” analysis. The software will let you find “standout companies” within a sector and a sea of largely unstructured data, the company says.
The world is rapidly changing, that much is certain, and our ability to generate insights is about to take quantum leaps. Are you in?