Comments
Germans Claim Breakthrough in Searching Big-Data
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
Re: Phone book or pre-sorted data?
  • 4/11/2013 4:57:48 PM
NO RATINGS

Hi Noreen. All this is beyond my level of technical expertise but if what the researchers say about the speed boost is true, and repeatable in real-world data environments, then I'd bet a lot of data folks would be interested in learning more about and demo'ing HAIL.

Re: Phone book or pre-sorted data?
  • 4/11/2013 9:25:49 AM
NO RATINGS

@SaneIT,

It is more like an optimized data organization and access strategy that can be applied to a limited number of cases. 

Re: practically speaking
  • 4/11/2013 9:12:22 AM
NO RATINGS

The question is what kind of big-data they are talking about. If the data is already structured by authors, zip code, ect... it can easily be queried with a query language. But we usually want to know more about the business information "hidden" within the data, than the information about its source. It is also clear that not every kind of big-data can be structured this way.

Re: Phone book or pre-sorted data?
  • 4/11/2013 7:39:17 AM
NO RATINGS

It really sounds like they hit a best of both world's scenario with HAIL, the look up time alone would make it worth trying.  That the data uploads can be a tad bit quicker mean you're not losing much even if it doesn't work.  I was thinking that the data set uploads would get drawn out but I'll take slightly better in this case.

Re: Phone book or pre-sorted data?
  • 4/10/2013 11:38:11 AM
NO RATINGS

This looks like a very interesting way to sort through data. I hope it becomes freely available to the public. With the innovation of new data storage methods it doesn't seem that difficult to store big data more easily. So having replicas won't become a big issue. And the cost would not be what it was a few years back. So the timing is also right for this system. I think HAIL has a good chance of succeeding.

Re: Phone book or pre-sorted data?
  • 4/10/2013 9:51:49 AM
NO RATINGS

Take a look at the conclusion from the initial research:

We have presented HAIL (Hadoop Aggressive Indexing Library). HAIL improves the upload pipeline of HDFS to create different clustered indexes on each replica. As a consequence each HDFS block will be available in at least three different sort orders and with different indexes. Like that, in a basic HAIL setup we already get three indexes (almost) for free. In addition, HAIL also works for a larger number of replicas. A major advantage of HAIL is that the long upload and indexing times which had to be invested on previous systems are not required anymore. This was a major drawback of Hadoop++ [12], which created block-level indexes, however required expensive MapReduce jobs to create those in- dexes in the first place. In addition, Hadoop++ created indexes per logical HDFS block whereas HAIL creates different indexes for each physical replica. We have experimentally compared HAIL with Hadoop as well as Hadoop++ using different datasets and a number of different clusters. The results demonstrated the high efficiency of HAIL. We showed that HAIL typically creates a win-win situation: users can uploadtheir datasets up to 1.6x faster than Hadoop and run jobs up to 68x faster than Hadoop.

Phone book or pre-sorted data?
  • 4/10/2013 7:35:33 AM
NO RATINGS

I can somewhat understand why they might compare it to a phone book, but really isn't it just presorted data?  It sounds like they have picked a few sorting methods that work better in specific circumstances and they use the fastest method for the data they are looking for.  This makes sense if you have the resources to pull it off.

Re: practically speaking
  • 4/9/2013 7:59:44 PM
NO RATINGS

Good question Beth

I'll have to look into that.

practically speaking
  • 4/9/2013 4:42:25 PM
NO RATINGS

Did the researchers mention whether they'd be making HAIL available to the "public"? Or will this remain an academic/research endeavor?

Re: Interested in hearing...
  • 4/9/2013 4:37:16 PM
NO RATINGS

In my experience, which involves talking with data professionals about Hadoop, so many are still trying to understand what that is and what it might mean for them. Not sure how'd they'd get their minds around this one, though it's an interesting premise.

Page 1 / 2   >   >>


INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +