UnitedHealthcare Wants High-Performance Analytics


Ask analytics directors whether they need a high-performance analytics infrastructure, and many will hem and haw. But not Mark Pitts, director, data science, solutions & strategy, at UnitedHealthcare.

"I want one," Pitts told a crowd of colleagues at last month's SAS Analytics 2012 conference.

Mark Pitts
UnitedHealthcare
Mark Pitts
UnitedHealthcare

This was no idle chatter or wishful thinking. Pitts has put high-performance analytics (HPA) through its paces and has concluded it's highly desirable for UnitedHealthcare, the health benefits arm of global healthcare company UnitedHealth Group. "We do an excellent job with member services today, but we want to make it the best experience in the industry. This is about taking analytics to the next level," Pitts told me in an interview.

With HPA, UnitedHealthcare can change up how it approaches analytics and the type of data it can analyze. Without HPA, Pitts's team is limited by the sample size it can use in its models and the iterations it can run, which results in "models that aren't as good as we know how to make them." Many datasets and business problems went untouched.

HPA, because it enables massively parallel processing (MPP), means the ability to run vastly more cycles simultaneously than possible on traditional analytics servers. Take this quick example: When Pitts's team launched its HPA proof-of-concept (PoC) testing, it intentionally loaded up the big-data environment with a simulation that was computationally and I/O intensive, with four million rows of data -- a process that took four hours and 15 minutes on traditional infrastructure. When it ran in the HPA environment, results came back immediately, as in 10 seconds.

"It came back so fast, we at first thought there was a syntax error," Pitts said.

For the PoC, he put up an HPA infrastructure comprising SAS High-Performance Analytics software running on Greenplum's big-data analytics platform, the Data Computing Appliance (DCA). I won't go into all the technical nitty-gritty here, but suffice it to say the SAS HPA procedures take advantage of the DCA's MPP capabilities to execute in rapid-fire fashion. The Greenplum DCA architecture comprises two racks with 384 CPU cores, 1.536 terabytes of memory, and 992TB of usable storage, resulting in a data load rate that hits 10TB/hour. In one typical test, for example, Pitts's team loaded 26.8 million rows of data in five minutes and 17 seconds, yielding a 9.2 TB/hour measured load rate.

"Before we were limited in the number of rows of unstructured data we could use to train the predictive models -- to maybe a few hundred thousand. With HPA, we hope to be able to use the entire database, which is hundred of millions of rows. We're really excited to let this loose on our entire database," he added.

Even more so, Pitts said he's looking forward to what HPA can do with the unstructured data UnitedHealthcare collects in the form of medical records, case notes and email text, call center transcripts, machine-generated logs, and much more. Toward that end, a second phase of the PoC testing focused on the use of text analytics on the HPA appliance.

That testing went well, too, Pitts said. "Using HPA text miner we are able to parse millions of rows of text data in a few minutes. The SVD (singular value decomposition -- an important mathematical step in the text mining process) also calculates in MPP mode alongside the database."

Ultimately, Pitts said he envisions a day when all of the traditional symmetric multiprocessing analytics servers disappear and HPA, with in-memory processing, becomes standard fare at UnitedHealthcare. But this big-data analytics stuff is expensive, so Pitts has a plan for getting the most bang for the company's buck, too. IT wants to use the HPA platform to provide data analytics as a service, he said.

"We think we can justify the expense by having it be a shared capability among data scientists across the enterprise. Data scientists could rapidly load large datasets, perform analytic processes quickly, and get out of the way so the next team could load their data and repeat. Sized appropriately, the platform can also support several teams working at once."

HPA makes sense for UnitedHealthcare, as its PoC testing has proven in one scenario after another. Do you see a fit for an ultra-fast big-data analytics environment at your company? Share your thoughts on HPA below.

Beth Schultz, Editor in Chief

Beth Schultz has more than two decades of experience as an IT writer and editor.  Most recently, she brought her expertise to bear writing thought-provoking editorial and marketing materials on a variety of technology topics for leading IT publications and industry players.  Previously, she oversaw multimedia content development, writing and editing for special feature packages at Network World. In particular, she focused on advanced IT technology and its impact on business users and in so doing became a thought leader on the revolutionary changes remaking the corporate datacenter and enterprise IT architecture. Beth has a keen ability to identify business and technology trends, developing expertise through in-depth analysis and early adopter case studies. Over the years, she has earned more than a dozen national and regional editorial excellence awards for special issues from American Business Media, American Society of Business Press Editors, Folio.net, and others.

Midmarket Companies: Bring on the Big Data

The "big" in big data is no reflection of the size of the organization embracing its potential.

Push Yourself to New Analytical Discoveries

Take inspiration from Christopher Columbus as you pursue your analytical journeys.


What is the value of High-Performance Analytics
  • 12/19/2012 11:02:25 AM
NO RATINGS

Many organizations struggle with identifying the business value of accelerating the performance of analytics. Can Mark describe what the business value will be to UnitedHealthcare of accelerating the time of the process from 4 hours 15 minutes to 10 seconds, and how it will impact on the business?

Re: Update?
  • 11/29/2012 2:46:40 PM
NO RATINGS

Right -- because a run time of ten seconds is great, but you have to consider time to load from Greenplum to HPA when weighing in-memory analytics versus in-database analytics (which don't require the extra load).

Usually people love to talk about stuff that works well, so the load time must be an issue.

 

 

 

Re: Several Questions
  • 11/29/2012 1:56:15 PM
NO RATINGS

Whoops! Missed this earlier. This OptimumInsight project looks fascinating, at a quick flip through the slides. I have to set aside some time to give it my full attention!

Re: Update?
  • 11/29/2012 1:54:54 PM
NO RATINGS

@thomaswdinsmore -- Mark declined going into further detail, but I would assume that, yes, data load time is a consideration!

Update?
  • 11/29/2012 9:30:27 AM
NO RATINGS

Any response to the last two questions?   It's curious that nobody wants to talk about the time to load data from Greenplum to HPA -- that strikes me as a material consideration when evaluating a co-located approach.

Re: Several Questions
  • 11/26/2012 5:31:57 PM
NO RATINGS

It's production work, not a test.  Here's a link.  OptumInsight is a unit of UHG    

http://www.ehcca.com/presentations/predmodel5/wickstrom_2.pdf

 

Re: Several Questions
  • 11/26/2012 4:46:12 PM
NO RATINGS

@thomasedinsmore, while I'm checking in with Mark on these answers, maybe you could briefly share the results of the other UHG high-performance test?

Re: Several Questions
  • 11/26/2012 10:22:17 AM
NO RATINGS

Beth,

Thanks for the response.  Two follow-up questions:

(1) How much time did it take to load the data from Greenplum into HPA's memory?

(2) Did you test any larger problems?  There is a group inside UHG that successfully run predictive analytics on billion-row datasets (using alternative technologies).

 

Re: Several Questions
  • 11/26/2012 9:07:51 AM
NO RATINGS

@thomaswdinsmore, I've checked with Mark Pitts, and here are his responses: 

(1) UHG has not yet purchased the product. 


(2) The load rate referenced in the article -- is that the time needed to load raw data into EMC Greenplum, or the time needed to load data from Greenplum into HPA's memory?
    - The load rate in the article is the time to load raw data into the EMC Greenplum DCA.
  

(3) The analysis on four million rows that takes four hours in the current state environment -- what analysis did Mr. Pitts's team test, and what environment does this currently run on?
    This was one of many tests we ran.  This one was a simulation written in DS2 that ran an algorithm that was computationally intensive and I/O intensive - we wrote it intentionally to put the DCA through it's paces.  The 4-hour run for the same algorithm ran on a dedicated, 16-core SMP Unix server - with dedicated meaning that nothing else was running on the server during the test.

Several Questions
  • 11/25/2012 12:54:46 PM
NO RATINGS

Several questions about this story.

(1) Did Mr. Pitts' firm actually license the product for production?  The article simply refers to a POC.

(2) The load rate referenced in the article -- is that the time needed to load raw data into EMC Greenplum, or the time needed to load data from Greenplum into HPA's memory?

(3) The analysis on four million rows that takes four hours in the current state environment -- what analysis did Mr. Pitts's team test, and what environment does this currently run on?

 

 

 

Page 1 / 2   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +