GPUs Deliver Supercharged Analytics


We noted last week that the market outlook for high-performance computing hardware dedicated to data analytics is quite rosy. As it turns out, traditional HPC clusters -- with their sky-high price tags -- aren't the only way companies can crunch massive amounts of data.

Sqream Technologies, for one example, is working on a parallel-processing solution that spreads massive data analytics operations over thousands of graphics processing unit (GPU) cores, rather than four or eight traditional processors.

I caught up with Sqream CEO Ami Gal on Friday. He said he first attempted to leverage GPUs for general database analytics back in 1997, but the technology was not mature enough. Four years ago, he met Sqream co-founder Kostya Varakin, whose ideas about orchestrating algorithms to run across an array of GPUs seemed viable.

Ami Gal, CEO of Sqream Technologies.
Ami Gal, CEO of Sqream Technologies.

"We're creating a product that gives huge data-crunching capability to customers through GPUs," Gal said.

Throughout 2013, companies piloted and beta tested Sqream's solution in its home market of Israel. The first foreign deployments began this year. General availability is expected next month.

Using commodity server hardware and Nvidia GPU cards, Gal said, Sqream brings "aggressive compression and then fast, efficient splitting of the work between cores." Its software converts structured data into a vector graph and processes queries in parallel across all available cores.

"It enables us to run on huge amounts of data extremely quickly," he said. In one beta test, Sqream completed a query on 1.1 billion banking records in 22 seconds. The bank's database management solution took nearly four minutes.

According to Sqream product information, its solution runs around $250,000, compared to equivalent HPC arrays that cost millions. It also offers faster performance and 20-50% lower maintenance costs, Gal said.

Right now, Sqream supports structured and semi-structured database formats, but the product roadmap calls for Hadoop support, as well. "We think that a lot of big data problems are suited for Hadoop, but it brings a lot of challenges." Phasing in an unstructured data analytics solution requires significant resources -- both monetary and human. "A lot of big data projects are not started because of the hassle."

An inexpensive, small-footprint solution like Sqream's can lower the barrier to entry and get companies started on the path of big data analytics. "They can do the same project with a 1u or 2u server, starting immediately, with no hassle."

Powering the Internet of Things
Some of the immediate applications for a small box that can process massive datasets include genomic research and base stations along telecom routes, Gal said.

The databases at the National Human Genome Research Institute start at 800 TB and scale rapidly to 1 PB or more. "There's nothing on the market that can solve their problems," so Sqream is working with several research labs to tackle petabyte-scale analytics.

"You can also put Linux and two Nvidia K1 [cards] and some flash storage in a shoebox, and you have a telco base station," he said. "It's a supercomputer in a shoebox." The low power draw and 15 TB capacity of such a basic Sqream solution makes it perfect for airplane and connected car systems, too. "There are plenty of new markets for the Internet of Things."

The way these technologies evolve, he said, is when companies with large-scale database needs seek out viable solutions. "I don't know anything about healthcare and genomics, but we have a partner that's an expert in genomic research, and they need a scalable database, so they found us." Cyber security firms, financial institutions, and telcos are also working to apply Sqream's model to their challenges.

What do you think, members? Is this sort of product a smart way to launch or supplement a big data effort? Share your thoughts on GPU acceleration below.

— Michael Steinhart, Circle me on Google+ Follow me on TwitterVisit my LinkedIn pageFriend me on Facebook, Executive Editor, AllAnalytics.com

Related posts:

Michael Steinhart, Contributing Editor

Michael Steinhart has been covering IT and business computing for 15 years, tracking the rising popularity of virtualization, unified fabric, high-performance computing, and cloud infrastructures. He is editor of The Enterprise Cloud Site, which won the Least Imaginative Site Name award in 2012, and he managed TheITPro.com, a community of IT professionals taking their first steps into cloud computing. From 2006 to 2012, Steinhart worked as an executive editor at Ziff Davis Enterprise, writing and managing research reports, whitepapers, case studies, magazine features, e-newsletters, blog posts, online videos, and podcasts. He also moderated and presented in dozens of webinars and virtual tradeshows. He got his start in IT journalism at CMP Media back in 1998, then moved to PC Magazine, managing the popular Solutions section and then covering business technology and consumer software. He holds a Bachelor of Arts degree in communications/journalism from Ramapo College of New Jersey.

Biosensors, Robots Replacing Human Guards

In prisons and other security settings, electronic devices are predicting and detecting anomalous behavior with high accuracy and reducing the need for human personnel.

Questioning the Ethics of Learning Analytics

Schools around the country are weighing the benefits and ethical questions of learning analytics.


Re: CPU v GPU
  • 6/30/2014 11:42:18 PM
NO RATINGS

Thanks for the insight and welcome, Craigmeister. Gal did mention Teradata and Netezza several times in the interview, but I don't have apples-to-apples comparison numbers, so I didn't want to get too detailed on competing platforms.

Re: CPU v GPU
  • 6/30/2014 8:36:45 AM
NO RATINGS

My impression was that the Sqream product is sort of a comoditized version of IBM's Netezza, which is a proprietary array of blade servers each with a programable processing card and its own storage. These are also very expensive.

Also, speaking of doing one thing well, video cards usually have hardware floating point processors (FPU) vs. the software FPU processing that exist in CPUs. This likely makes things like aggregation very fast.

Re: CPU v GPU
  • 6/30/2014 7:18:39 AM
NO RATINGS

That is the answer at the lowest level.  If you're going to spend a lot of resources moving to a GPU based farm you have to weigh what you will actually be getting out.  Like any data project most times it is easier to optimize what you have than it is to re-tool.

Re: CPU v GPU
  • 6/28/2014 10:51:13 PM
NO RATINGS

So the question for Sqream is, how much configuration and customization does your software need in order to work. Is it possible, though, that because it's handling traditional SQL queries, the process isn't as complicated as it might be with a more versatile tool?

Re: CPU v GPU
  • 6/27/2014 7:04:05 AM
NO RATINGS

Yes middleware can be used to do the conversion but it's not like you just buy a software license, insert an install disk and the middleware installer wizard figures everything out.  That middleware has some heavy lifting to do and requires some quality time spent on configuration and optimization.

Re: CPU v GPU
  • 6/26/2014 2:27:05 PM
NO RATINGS

Thanks for the insight - but isn't that what 'middleware' offerings like Sqream's are designed to do? Take an existing instruction set and parallelize it without any need for additional coding?

Re: CPU v GPU
  • 6/26/2014 8:10:31 AM
NO RATINGS

Aha! I knew there was a catch, and rewriting apps to use with a GPU cluster certainly qualifies as one of them. Thanks SaneIT.

Re: CPU v GPU
  • 6/26/2014 7:15:18 AM
NO RATINGS

I'm a techie first, so this falls right into my wheelhouse.  I have not used GPUs for analytics, my experience was rendering images and video for cinematic use in the gaming industry.  The concepts are the same, parallel data that needs to be blended into a singular data set.  The downside is that if you have a system in place currently that it is not likely to easily migrate from a CPU backed farm to a GPU backed farm.  Since GPUs are much simpler your application will have to be written specifically for use with a GPU cluster.  That can be a major undertaking.

Re: CPU v GPU
  • 6/25/2014 10:53:59 PM
NO RATINGS

I don't know enough about the technology here myself, but given what we've learned so far from this post and the comments, I would suppose GPU architectures will remain a speciality in the data analytics realm if for no other reason than how entrenched the CPU is.

Re: CPU v GPU
  • 6/25/2014 10:21:58 PM
NO RATINGS

My admittedly limited understanding is that, as long as the instructions are fairly simple and can run in parallel, GPUs are a superior architecture. They're designed to render high-resolution images in games, so they need to work fast.

Page 1 / 2   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +