As enterprises grapple with how best to handle their rising big data requirements, industry players of every ilk are vying for attention. Who will win -- if it's even possible that a single victor will emerge -- is anybody's guess. But one thing is certain -- we need a new paradigm.
"With technical triggers coming from absolute amounts of data being created by consumer data, location information, and sensor networks, the fact is we can't just rely on processor speeds and spinning disks to go faster anymore," explained Tony Jewitt, vice president of big data solutions at enterprise Web and search consulting firm Avalon Consulting, in a recent interview with AllAnalytics.com.
Enterprises will see five primary types of contenders girding up for the big data race, he said: traditional database, business intelligence, and data warehousing companies; second-generation database, BI, and data warehousing companies (the "accelerators" of the world); NoSQL, non-relational, and new SQL database companies; the Hadoop ecosystem; and enterprise search engines. Watch, too, for a convergence of search and analytics -- a particularly important trend for companies with big Web operations, such as many of Avalon's clients.
"We think search is as good or better a place to start for big data," Jewitt said.
After all, BI software has been the top technology for analyzing large volumes of structured data and search engines the means of doing so on the unstructured data. Adding BI capabilities, including aggregation functions, cross-tab reports, and data visualizations, to advanced search is a next logical step.
Converging BI and search would provide a "best of both worlds" scenario for many big data situations, Jewitt suggested. Companies would be able to keep their full-featured search and BI interfaces while interactively cross-analyzing unstructured and structured datasets. They'd not be hamstrung by rigid schemas and would be able to take advantage of metadata enrichment. And, they'd be able to operate on real-time raw data.
From a presentational standpoint, ultimately what we could expect to see from a convergence of analytics and search are search results that look awfully much like a BI report -- with data presented logically in columns, including, if so desired, visualizations. Avalon recently demonstrated how the merger of search and analytics might play out for a data-heavy Web entity like the Library of Congress. In particular, it developed an advanced search and analytics application for accessing the legislative information found in the publicly available Thomas database.
Those familiar with last year's McKinsey Global Institute big data report might recall the Library cited among those organizations with a daunting volume challenge. According to McKinsey, the Library collected 235 terabytes of data in April 2011 alone -- and yet 15 out of 17 sectors in the US have more data stored per company than it does. Like many organizations, the Library wondered what a modern approach might do to improve the way the public uses the database, Jewitt says.
"This isn't just about improving search but about getting more out of the body hearings, whether that's about drilling for oil in the gulf or greenhouse gas emissions -- so that requires convergence of search and BI."
In the project, Avalon used a variety of advanced technologies on a witness database. These include a next-generation, real-time search and analytics engine, the Hadoop framework, and open-source text processing software. In particular, it used MarkLogic for the search and analytics engine; it added metadata to the Library information in the MarkLogic database using the open-source General Architecture for Text Engineering (GATE) software running on Amazon Elastic MapReduce, Jewitt said. Minus this technology, processing the witness information took three hours. Running the same process with the new technology and by firing up 20 two-processor servers in the Amazon cloud took 10 minutes, as demonstrated for me during the interview. The cost, at $4 an hour, was negligible.
The Library's Thomas.gov database demonstration shows the power of fusing next-generation search and analytics using the new database technologies lumped in Jewitt's third category spelled out above. "That's what we see as a credible path for how people will be dealing with big data."
Are you mulling over your best approach to big data? Do you agree or disagree with Jewitt's assertion that search and analytics will fuse? Share on the message board below.
Related posts:
Where You Might Find Big Data & What You Can Do About It
Amazon Wants Your Analytics in Its Cloud
Outfitter Hunts for Big Data Architecture