Generate Your Insight


I remember several years back hosting applications like OddCast on a Website and having computer-generated avatars greet visitors was a cool thing to do. The avatars seemed to follow mouse movements with their fake eyeballs!

For a time, as reported in this Radio Television Digital News Association blog, it seemed revolutionary that avatars could read news stories aloud. But that was mainly presentation stuff, the news stories were, and are still, human curated.

Well, the human part, curation, might come to an end, sooner than we think.

Kris Hammond, one of the founders and CTO of Narrative Science, a company working on technologies to generate narratives from data, thinks in 15 years, 90 percent of the news stories will be computer generated, as he discusses here. There will still be room for human curation, but many of the stories will be almost entirely “automated”:

A computer can write highly localized crime reports, personalized stock portfolio reporting, high school and youth sports stories at scale to provide coverage that was previously impossible and could never be possible in a world of purely human generated content.

I doubt we are anywhere near generating narrative stories from unstructured data. Yet, news story automation already may be here. In early December 2011, a computer-generated news portal called The Wall launched, analyzing and displaying real-time local Twitter trends while automatically clustering the information into news topics.

On closer examination, I found each clustered news story ends up linking to one or more actual newspaper stories written by a human reporter. So, while perhaps human bias is not eliminated, the selection of stories appears to be automatic.

Fast Company writes about something similar to topic clustering being used on Shakespeare’s First Folio through DocuScope, word analysis software developed at Carnegie Mellon University.

William Shakespeare
William Shakespeare

The Fast Company piece points out a surprising discovery made using DocuScope. Othello -- despite labeled a tragedy -- turns out to be a comedy. Shakespeare apparently used comedic stylistic cues to intensify the play's tragic aspects. Turns out that Shakespeare’s vocabulary and syntax varied wildly between his comedies, historical plays, and tragedies. In fact, according to a DocuScope insight, the funniest thing Shakespeare wrote was a portion of The Merry Wives of Windsor, while a passage from Richard II was the most serious.

Then again, in our age of “big data,” we can now visualize how our literary expressions differ and evolve over time. Take the Corpus of Contemporary American English, or COCA, comprising 425 million words of text from the past two decades, and compare it with equally large samples drawn from fiction, popular magazines, newspapers, academic texts, and transcripts of spoken English. The New York Times recently wrote how the COCA program detected patterns a human would never have found, such as which past-tense verbs show up more frequently in fiction compared with those showing up in academic prose.

And again, the same technology that analyzes unstructured data and turns it into computer-generated insights also can predict what may happen in the future, in the case of the Recorded Future platform, which is partially funded by the CIA and Google. I was a recent guest of the Recorded Future Webinar on the Future of the World Economy and Alternative Energy in 2012.

Recorded Future view
Recorded Future view

The Recorded Future looks at 100,000 Web pages an hour, scanning across 50,000 sources -- from Securities and Exchange Commission filings to Twitter comments. As discussed in this New York Times blog, it looks for statements about the future, like notices of an annual meeting or predictions about when a product might be released, and past developments, and then creates a “temporal index” that suggests momentum trends and unusually strong relationships between key players in a timeline in order to generate unusual insights.

The Recorded Future is not alone in generating insights. Companies such as Palantir Technologies attempt to visualize the world’s governmental and financial information, as well. Read this blog, for example, detailing Palantir's analysis of the recent turmoil in the Sudan. It performed the relational, temporal, statistical, geospatial, and social network analysis on more than a dozen open sources of intelligence data to gain a deeper understanding and insights around conflict, and how it might be resolved.

Yet another platform, Quid, aims to discover new opportunities through a “white space” analysis. The software will let you find “standout companies” within a sector and a sea of largely unstructured data, the company says.

The world is rapidly changing, that much is certain, and our ability to generate insights is about to take quantum leaps. Are you in?

Marshall Sponder, Web Analytics and SEO/SEM Specialist

Marshall Sponder is a Web analytics and SEO/SEM specialist with expertise in market research, social media, networking, and public relations. As both an in-house team leader and consultant, he has used sophisticated analysis to optimize the social media marketing efforts of companies and brands including IBM, Monster, Porter Novelli, WCG, Gillette, Pfizer, Warner Brothers, Laughing Cow, The New York Times, and Havana Central. Sponder is a board member emeritus at the Web Analytics Association, a member of the Search Engine Marketing Professionals Organization (SEMPO), and a member of the Certified Institute of Public Relations Social Media Measurement Study Group (CIPR).

Putting Analytics Fragmentation Into Perspective

When the data we don't know is as important as the data we do, our analytics platform are all but guaranteed to fail us.

3 Musts You'd Find in an Ideal Analytics Platform

Segmentation, multichannel integration, and intelligent dashboard reporting are vital capabilities, yet many business analytics solutions fall short.


Journalists & Bloggers of the World Unite!
  • 1/13/2012 1:32:26 PM
NO RATINGS

Uh oh, Marshall! As a blogger and former newspaper journalist, I don't like the sound of this! :) Of course, there is a difference between curating content and creating it and then between creating standard content based on readily available information and specialized, idiosyncratic, opinionated content that makes up the best commentary. My suspicion is that the human desire for self-expression will keep humans writing but perhaps much of the drudge work involved in simple regurgitation of facts will be gone. The mechanization of yet another human function may also cause unsuspected problems. Those information outlets that have survived the collapse of the large scale media--think of sources as disparate as Fox News and The Onion--do so specifically because their content is stylized, opinionated and, in many cases, bias. So could unbiased insight ruin a business model?   

Re: Journalists & Bloggers of the World Unite!
  • 1/13/2012 3:45:59 PM

@Shawn, you ask, "Could unbiased insight ruin a business model?" I hope not. Isn't that what business analytics is all about -- delivering facts for corporate decision-making? There's no bias in a factual bit of data.

  

Alas! Me Literature Prof Turns in His Grave
  • 1/13/2012 3:49:52 PM

@Marshall, of all the many interesting things I've learned in the six months since launching AllAnalytics.com the fact that the classic Shakespearian tragedy Othello is actually a comedy has got to be one the best. Brit Lit teachers around the globe must be flummoxed. 

 

Re: Alas! Me Literature Prof Turns in His Grave
  • 1/13/2012 4:03:18 PM
NO RATINGS

Have to admit - as much as I like England, and go there often, in fact, in 10 days - I can't say I can read Shakespeae all that well.  I'm glad someone figured out how to tell Othello was a comedy vs. tradegy, as it would probably never have been me.  Ha!

Re: Journalists & Bloggers of the World Unite!
  • 1/13/2012 4:06:28 PM
NO RATINGS

I suppose so, Shawn - I don't think machines can replace humans - but I do think they can gather a lot of the information for us and prepare it so we can process it, in only the way a human can, faster than otherwise.

Yes, of course, there are tradeoffs in that we might miss things we'd find otherwise, as the bots failed to pick them up, or just the reverese, have a bunch of junk to look at and summerize.   And then, we're back to square one.

In fact, if machine intelligence can't make this job much easier (to gather and curate information) we're better off not using it at all.

Re: Journalists & Bloggers of the World Unite!
  • 1/13/2012 11:12:40 PM
NO RATINGS

Hi Beth,

To clarify, Marshall first described how:

Kris Hammond, one of the founders and CTO of Narrative Science, a company working on technologies to generate narratives from data, thinks in 15 years, 90 percent of the news stories will be computer generated,

Further, he explains that:

In early December 2011, a computer-generated news portal called The Wall launched, analyzing and displaying real-time local Twitter trends while automatically clustering the information into news topics.

Marshall says that:

On closer examination, I found each clustered news story ends up linking to one or more actual newspaper stories written by a human reporter. So, while perhaps human bias is not eliminated, the selection of stories appears to be automatic.

I would submit that, in some cases, the "bias" we are describing here is an individuality and point of view that cannot be replaced by technology and actually causes us to choose or accept one source of information over another or, more precisely, to choose one bit of information as more relevant to us personally than another.   

Re: Journalists & Bloggers of the World Unite!
  • 1/13/2012 11:24:29 PM
NO RATINGS

Hi Marshall,

Actually, I see some serious advantages in the automation and aggregation of news and information that has freed up the flow of data and the power that access to that data has given common people in their daily lives. This is especially true when that technology uses a means that allows those consuming that data to choose the information they find most relevant and helpful outside of the control of an outdated and hierarchical information system. I just think that the human element here is critical and cannot be ignored as a partner with the technology used to help manage that data more efficiently. 

Re: Alas! Me Literature Prof Turns in His Grave
  • 1/13/2012 11:54:14 PM
NO RATINGS

Hi Marshall,

Alas, there's the rub, as the Bard himself might say. Othello, of course, was not a comedy, but the playwright used words and phrases associated with comedy for effect. Would our algorithm have known the difference without human interpretation to guide and interpret its results?

Re: Journalists & Bloggers of the World Unite!
  • 1/14/2012 1:02:11 AM
NO RATINGS

I think we're saying the same thing - there re advantages and disadvantages to new technologies.

Re: Alas! Me Literature Prof Turns in His Grave
  • 1/14/2012 1:07:19 AM
NO RATINGS

I think the article I got that information from also stated the common defination of a tradegy and comedy in Shakespere's time was somewhat different than what we think of those genres, today.   

So there's a few parts to this theme -

    - we, in our time, have relabeled and redefined what Shakespere meant - not alwyas accurately.

 

    - also, the software went back and tried to uncover the real genre of Othello, and found it was a comedy - as defined in Shakesphere's time period and local, not ours.

 

Think of it this way - you have an old silver jug, it's all rusty, and you think the rust is part of the design - but then you put some silver cleaner on it- and it looks all new and shiny - and all of a sudden,  like a different piece than what you thought of it originally - that is what software mentioned in my article did for Othello.

Page 1 / 2   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +