We noted last week that the market outlook for high-performance computing hardware dedicated to data analytics is quite rosy. As it turns out, traditional HPC clusters -- with their sky-high price tags -- aren't the only way companies can crunch massive amounts of data.
Sqream Technologies, for one example, is working on a parallel-processing solution that spreads massive data analytics operations over thousands of graphics processing unit (GPU) cores, rather than four or eight traditional processors.
I caught up with Sqream CEO Ami Gal on Friday. He said he first attempted to leverage GPUs for general database analytics back in 1997, but the technology was not mature enough. Four years ago, he met Sqream co-founder Kostya Varakin, whose ideas about orchestrating algorithms to run across an array of GPUs seemed viable.
Ami Gal, CEO of Sqream Technologies.
"We're creating a product that gives huge data-crunching capability to customers through GPUs," Gal said.
Throughout 2013, companies piloted and beta tested Sqream's solution in its home market of Israel. The first foreign deployments began this year. General availability is expected next month.
Using commodity server hardware and Nvidia GPU cards, Gal said, Sqream brings "aggressive compression and then fast, efficient splitting of the work between cores." Its software converts structured data into a vector graph and processes queries in parallel across all available cores.
"It enables us to run on huge amounts of data extremely quickly," he said. In one beta test, Sqream completed a query on 1.1 billion banking records in 22 seconds. The bank's database management solution took nearly four minutes.
According to Sqream product information, its solution runs around $250,000, compared to equivalent HPC arrays that cost millions. It also offers faster performance and 20-50% lower maintenance costs, Gal said.
Right now, Sqream supports structured and semi-structured database formats, but the product roadmap calls for Hadoop support, as well. "We think that a lot of big data problems are suited for Hadoop, but it brings a lot of challenges." Phasing in an unstructured data analytics solution requires significant resources -- both monetary and human. "A lot of big data projects are not started because of the hassle."
An inexpensive, small-footprint solution like Sqream's can lower the barrier to entry and get companies started on the path of big data analytics. "They can do the same project with a 1u or 2u server, starting immediately, with no hassle."
Powering the Internet of Things
Some of the immediate applications for a small box that can process massive datasets include genomic research and base stations along telecom routes, Gal said.
The databases at the National Human Genome Research Institute start at 800 TB and scale rapidly to 1 PB or more. "There's nothing on the market that can solve their problems," so Sqream is working with several research labs to tackle petabyte-scale analytics.
"You can also put Linux and two Nvidia K1 [cards] and some flash storage in a shoebox, and you have a telco base station," he said. "It's a supercomputer in a shoebox." The low power draw and 15 TB capacity of such a basic Sqream solution makes it perfect for airplane and connected car systems, too. "There are plenty of new markets for the Internet of Things."
The way these technologies evolve, he said, is when companies with large-scale database needs seek out viable solutions. "I don't know anything about healthcare and genomics, but we have a partner that's an expert in genomic research, and they need a scalable database, so they found us." Cyber security firms, financial institutions, and telcos are also working to apply Sqream's model to their challenges.
What do you think, members? Is this sort of product a smart way to launch or supplement a big data effort? Share your thoughts on GPU acceleration below.
— Michael Steinhart, , Executive Editor, AllAnalytics.com