Analytics applications often have extremely demanding infrastructure requirements. CPU, RAM, disk, network, and other components can all become heavily stressed. However, rather than all components being stressed equally, analytics applications are typically constrained by a primary chokepoint. Add resources to alleviate the primary chokepoint, and another bottleneck -- caused by the next most constraining resource -- will quickly appear.
Perhaps that is a bit simplified. Rather than only adjusting physical infrastructure, you can tune applications themselves to help meet the demands of different types of analysis as well as different types of data. Still, a well-architected infrastructure remains critical to ensure analytics applications run efficiently and economically.
From the economics perspective, a number of variables need considering. Some relate to cost and others to value. What is the capital investment required to build the infrastructure? What is the cost of processing a given dataset? How quickly can a given dataset be processed? Can processing be completed fast enough so that processing of the next dataset can begin as soon as it is available? What is the average utilization of the infrastructure while processing data and over longer time periods that include non-processing times?
As these questions suggest, coming up with the "best" architecture is not an easy exercise. In fact, when considering how infrastructure requirements change over time, you could argue that that no single, best architecture exists. After all, as long as infrastructure demands vary over time, any fixed infrastructure will at times be underutilized and/or not be able to keep up with demand.
This is where cloud computing has outstanding potential. Cloud computing is not about fixed infrastructure. It is about dynamic allocation of infrastructure resources to meet the needs of an application at the time it actually needs those resources. When a cloud-ready analytics application is not running, you pay no infrastructure costs. This makes the opportunity costs of underutilized infrastructure disappear.
When it comes to controlling costs, cloud computing has an additional advantage to consider. You can access massive cloud infrastructures with no capital investment. This means cloud users don't have to try to architect the "best"-sized infrastructure in advance, committing large sums of capital in the process. They simply deploy what they need, when it is needed, and pay for what they consume.
Cloud computing also addresses performance and scale issues. Cloud-ready analytics applications are designed to scale out. Adding more cloud servers or other cloud infrastructure means quicker processing for a given dataset. It also means the ability to process larger quantities of data at the same time.
Cloud computing isn't the perfect solution for every analytics problem. However, the flexibility, elasticity, scale, and cost attributes it delivers are certainly worth understanding and considering.
Paul, I agree that the dynamic and highly scalable nature of cloud infrastructure makes it a potentially good choice for handling increasingly intense analytics processing requirements. But I'm curious, from where you sit, is it more feasible, all things considered, to build a private cloud for analytics, go with an external private cloud provider, or use open public cloud infrastructure? I know the answer is likely to vary by company, but any general rules of thumb here?
Yes, I think there are at least a few. Comparing the on-premise pivate cloud with the open public cloud:
- It would rarely be a good idea to build a private cloud solely for analytics (or solely for any single application or purpose). To ensure a high utilization rate on a private cloud, you typically need many workloads or applications with different priorities and different consumption patterns that change over time.
- However, if you have a private cloud, then it often makes sense to run everything you can on that until you are out of capacity (or running at max target utilization). Of course, this means prioritizing different applications, allowing some to run at off peak times, and so on. Ultimately he private cloud should run at high utilization rates, otherwise it becomes more like traditional fixed infrastructure that suffers from low utilization rates.
- Assuming you have a private cloud, a big question is how well the application "fits" the private cloud? Does the private cloud have capacity? Is it available when the analytics app needs to run? If so, then the private cloud is probably a good choice.
Whether you have a private cloud or not, if you need very large amounts of compute resources for short periods of time, you may need the capacity of a public cloud.
Other considerations are location of your data and the cost / time to move the data to/from a public cloud. Data transfer costs can add up.
Also, if there are governance issues that restrict data from being stored in a public cloud, that would be an important consideration.
Oooops. Sorry, Paul. See you've already answered my question about private clouds. Thanks! I think the cloud is a good option for companies without the resources for developing their own analytics infrastructure, but, as you say, these questions should always be decided on a case by case basis.
Paul -- great rules of thumb. I especially think the cost and time of transferring data into and out of the cloud (plus potential integration and, of course, privacy, challenges) are factors that IT execs must pay particular attention when weighing whether to move analytics processing in the cloud. Best to start working toward an understanding of that now before data voiumes get so overwhelming public cloud becomes the only reasonable option!
Hi Beth. I'm curious. Isn't cloud computing usually a shared resource where you pay for what you use as needed thus avoiding infrastructure costs? What would be the advantage of building a private cloud for analytics (I'm assuming you're talking about one created for the exclusive use of one company or organization) and what exactly would this look like?
Hi Shawn - I saw your other comment as well, thanks. Still, I'll go ahead and add a couple more thoughts...
One of the great cloud computing debates has been public versus private clouds. Some people strongly say that private, on-premise clouds are not really clouds or not really cloud computing. There are some good points there... private clouds involve capital expenditures, they are not pay-per-use driven, they don't match the massive scale of public clouds, and so on.
However, other people feel just as strongly that private clouds offer a legitimate cloud computing model. I'm in this camp. My thoughts are that private clouds *do* still offer many of the core characteristics of cloud computing: elasticity, multi-tenancy (even across business units / partners etc.), higher efficiency, greater automation, APIs, self-service interfaces and so on. Also, there are public and private clouds based on the same software! So, they can have the same functionality. To me it is most important to recognize the differences and unique benefits of each. The terminology becomes more of a religious debate.
Enterprise IT organizations in particular seem to like private clouds for:
- improved efficiency over their *traditional* IT
- direct control over the entire infrastruture
- governance issues, where data must be kept locally
Since, IMO, private clouds produce business value, they ought to be acknowledged as legitimate.
Great conversation! As a bit of a follow up, in a recent Gartner survey we'll be blogging more about soon Cloud and SaaS are used interchangeably though obviously they aren't the same. From your perspective, what are the practical differences between Cloud and SaaS in the field of analytics and BI.
Let's start with "cloud." As I'm sure you've seen, cloud and cloud computing are often over-used. In the extreme over-use case, people use "cloud" to refer to just about anything connected to the Internet. The term "cloud washing" is used to describe situations where something that is not cloud is called cloud (sometimes in marketing situations).
To narrow the definition of cloud a bit, three primary service models are most commonly used to describe cloud computing:
1. Infrastructure as a service (IaaS)
2. Platform as a service (PaaS)
3. Software as a service (SaaS)
The "as a service" pattern is easy to spot, so people often run with that and start coming up with *anything* as a service and call that cloud computing.
While it is true that there is a huge trend to turn alls kinds of things into services, simply delivering something as a service does not make it cloud.
To further clarify, cloud services should generally have a distinct set of characteristics that may include:
- resource pooling (gathering resources so they can be aggregated and shared)
- elasticity (scale both up and down)
- multi-tenancy (enabling multiple distinct users - even different paying customers - to transparently share the same underlying resources)
- etc.
Of course it is still gray because people argue which characteristics "must" be part of a solution for it to be called cloud and so on. Still, that gives most people a pretty good sense of what cloud computing is (and is not).
That said... analytics can be delivered through SaaS. That is essentially a multi-tenant application that is delivered by a service provider on a pay-per-use basis. Users don't have to own/install/configure hardware and software. They just use the analytics application -- supplying their own data.
For IaaS, people can use some form of an IaaS cloud (public/private, on/off premise...). In the case of a public IaaS, the customer doesn't own or directly manage the share, elastic, multi-tenant infrastructure. They just use what they need and pay for that. They must add their own analytics application -- buy it, install it, configure it etc. The analytics app must (genearlly) be architected to scale-out as more compute or other resources are needed. IaaS clouds are great at scaling out... just keep adding more inexpensive servers to get faster analytics (as opposed to buying larger more powerful servers to scale up).
Whew... lots of background simply to call out a few differences between IaaS and SaaS based analytics.
The issue of control is important. If you're leveraging the cloud to use when you need it you don't want to be told you're in line because a bigger higher paying customer cut in front of you. I suspect that's why many opt for private clouds at least initially.
Also, we used to deliver analytics as a service in the form of credit scores calculated real time and delivered to loan origination systems on demand. That's different than getting a score from the bureaus which are usually just calculated in batch and populated to the files. To be clear these were custom or proprietary scores - not standard FICO scores.
Hi Cordell - Thanks for the comments. Regarding higher paying customers cutting in front, I wonder if that concept comes from Amazon Web Services (AWS) spot instances. Those cloud servers go to the highest bidder. The idea (at least for many users) is to take advantage of lower prices at off peak hours when cloud servers are more available. AWS also offers reserved instances that can't be taken away by higher bidders -- this is the usual scenario. Most providers operate strictly on a reservation basis like this.
The credit scores on demand sounds interesting. I like the real-time nature of what you did. I could imagine the demand for that fluctuating quite a bit.
Do you know of any specific example where the incident you're describing (a subscriber having to wait in line for service behind a higher paying customer) actually happened? I've never heard this complaint from cloud users but maybe you've heard of a situation I have not. I suspect the real reason for private clouds has to do with concern over data security...and perhaps the idea that companies should have direct control and ownership over their data management system.
@Shawn. No specific incidents but when I worked at a large credit card processer the big clients always got priority. It wasn't exactly cloud though. More like outsourced. While many services were shared, larger clients had dedicated teams, hardware, systems etc. But every organization has limited resources. If they get slammed they'll have to make decisions about who gets priority. For efficiency a cloud servicer probably isn't going to build a whole bunch of extra capacity until it's demanded.
I'd also add that since cloud providers offering everything from software to hosting and just about any other tool or solution imaginable, including BI and analytics, have focused their marketing on small or medium sized companies who can't afford much infrastructure, at least at the outset, it seems to me it would not be in their own best interest to make the little guys wait. In the beginning, at least with other services, that has been their core market.
Shawn, that is an excellent point. The cloud provider perhaps at least needs a dedicated support team for SMBs and a separate support team for enterprise-size clients. Or is that asking too much of these cloud providers?
I guess my point is that the whole idea of cloud providers and their whole business model is built around the idea of offering reliable computing, including analytics, for a fee. I'm thinking of cloud services in the broadest possible sense here. In businesses I have been involved with in the past, we provided a variety of services completely reliant on cloud support of a sort, providers who maintained our hosting, Web templates, constantly upgraded publishing software, even Web analytics, monetizing and marketing tools. There was no such thing as having the services suddenly unavailable one day because a bigger client needed them. (And there were much larger clients than us!) Customers simply would have left en masse. Just because you don't develop and own your own infrastructure or have your own IT department does not make these services any less reliable. Cloud services must be reliable for all their customers or else the providers of those services simply won't have any.
Though I think a lot of us have used and thought of cloud computing primarily this way, there are private cloud computing applications built around, as Paul says, the need for "many workloads or applications with different priorities and different consumption patterns that change over time." See Paul's further explanation here. Of course, Paul says a debate exists between those who might argue private clouds aren't really clouds at all and those who say they are. I suspect the truth is that in some circles, cloud is simply synonymous with outsourced computing when perhaps a better definition has to do with how that service is built and delivered.
To elaborate on Paul's answer, Shawn, enterprises often start with private clouds -- run internal to their data centers or at an externally hosted site, to gain the benefits of that type of infrastructure -- highly dynamic, scalable, available to users on demand -- but maintain control of data to meet privacy mandates and what-not that they can't in the public cloud.
Our conversation thread is moving fast and furious today so missed your last two comments before my last response to Paul was posted. Thanks for the clarification on private cloud. Appreciate it and thanks for the additional specifics and perspective.
LEADERS FROM THE BUSINESS AND IT COMMUNITIES DUEL OVER CRITICAL TECHNOLOGY ISSUES
The Current Discussion
Visual Analytics: Who Carries the Onus? The Issue: Data visualization is an up-and-coming technology for businesses that want to deliver analytical results in a visual way, enabling analysts the ability to spot patterns more easily and business users to absorb the insight at a glance and better understand what questions to ask of the data. But does it make more sense to train everybody to handle the visualization mandate or bring on visualization expertise? Our experts are divided on the question. The Speakers: Hyoun Park, Principal Analyst, Nucleus Research; Jonathan Schwabish, US Economist & Data Visualizer
To save this item to your list of favorite AllAnalytics content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.
Elizabeth Barth-Thacker, a BI and informatics technology manager at Humana, tells us how her team is creating data transparency and building engagement with the business – with the help of an internal collaboration portal called Humanalytics.
Speaking at SAS Global Forum Executive Conference, Rajeev Kaul, SVP of pricing at OfficeMax, uses a Chinese proverb to explain one of the reasons he's deploying SAS Visual Analytics.
In an All Analytics interview, Mike Cavaretta, technical leader, predictive analytics at Ford Research & Advanced Engineering, shares how big-data is fueling vehicle decisions.
Analytics professionals and SAS executives share how organizations can get on with their work so much faster when working in a high-performance and visual analytics environment.
Analytics professionals who attended SAS's recent Executive Briefing in New York share how they think visual analytics might help their organizations get better value from data.
At Boeing, effective decision making comes down to this simple formula: QxA=E, as executive Jerry Allyne explained at the recent INFORMS analytics conference.
Whether working in major league sports, financial services, or healthcare, analytics, and data, professionals are checking out how visual analytics and high-performance technologies can help them optimize their environments, shrink their cycle times, and improve decision making, as attendees at the recent SAS Executive Briefing in New York share with us.
SAS CEO Jim Goodnight speaks with us at a recent SAS Executive Briefing about getting a feel for what's in your big-data and other new realities powered by advanced analytics.
Dynamic data visualizations let analysts and business users interact with the data, changing variables or drilling down into data points, and see results in a flash. Advance your use of data visualization with tools that support features like auto-charting, explanatory pop-ups, and mobile sharing.
No doubt your enterprise is amassing loads of data for fact-based decision-making. Hand in hand with all that data comes big computational requirements. Can traditional IT infrastructure handle the increasing number and complexity of your analytical work? Probably not, which is why you need a backend rethink. Big data calls for a high-performance analytics infrastructure, as Fern Halper, a partner at the IT consulting and research firm, Hurwitz & Associates, discusses here.
Redbox's bright-red DVD kiosks are all but ubiquitous these days, located in more than 28,000 spots across the country. Jayson Tipp, Redbox VP of Analytics and CRM, provides an insider's look at how the company has accomplished its phenomenal nine-year growth.
InterContinental Hotels Group (IHG), a seven-brand global hotelier, has woven analytics into the fabric of its operations. David Schmitt, director of performance strategy and planning, shares IHG's analytics story and his lessons learned.