From the Apache Web servers to the Linux systems and Android mobile devices they use, businesspeople know open-source, whether they know it or not. But does that mean they're willing to entrust their business intelligence strategies to the vagaries of open-source development? After all, these are the undertakings aimed at sharpening an organization's decision-making prowess.
That's the question we pose in today's Point/Counterpoint debate, at right, in which two experts stand off against one another on the reasonableness of the open-source analytics movement. And, yes, I have used the word "movement" deliberately.
The idea of open-source analytics isn't new, especially in the Web world -- Open Web Analytics and Piwik come to mind, for example. But a growing number of companies have launched in recent years to take on the BI and analytics mainstays with commercialized software rooted in open-source. The companies include:
Infobright and Ingres, which provide open-source analytics databases.
Jaspersoft and Pentaho, which provide open-source BI platforms.
Rapid-I and Revolution Analytics, which provide open-source data mining and predictive analytics.
And that's just on the commercial front. Loads of open-source analytics software products are available for the taking. Among the most popular is R, which provides a variety of graphical and statistical techniques, including clustering, linear, and nonlinear modeling, along with time-series analysis. PSPP, used for statistical analysis of sampled data, is another example of an open-source analytics project.
In his Point piece, Stephen Samild, co-founder of Analyst First, makes the case that spending on commercial analytics software takes away from investing in analytics talent. That's the main thrust of Analyst First, which encourages organizations to place more emphasis on the people who perform, manage, request, and envision analytics than on the tools enabling the work.
Ajay Ohri, an analytics watcher for Decisionstats.com, takes a more cautionary stance on the viability of open-source analytics. It's not that the technology is lacking, he says, because oftentimes it's not. But he asks organizations to weigh technology capability against such must-haves as customer service, and he suggests that commercial vendors have the upper hand on the oh-so-important extras.
From where he sits -- at the CFO desk at Volunteers of America Chesapeake -- Shyam Desigan says open-source analytics is a great option. As Desigan says in his blog, VOA Chesapeake Heads Toward Predictive Analytics, the Open-Source Way, the use of R and Pentaho's open-source BI platform is enabling the nonprofit to pursue a predictive analytics strategy. Open-source analytics software, particularly when combined with low-cost, easy-access BI-related cloud options such as KnowledgeTree's data management software as a service, can help bring any smaller organization into the analytics fold, he says.
But what of larger enterprises? Can a larger, more traditional company benefit from the use of open-source analytics? Read the debate and Desigan's case study, and weigh in with your opinions on the boards below.
Thanks for sharing your insights on R/Rattle. Companies like Revolution Analytics are taking the same approach with providing development support around R like with Red Hat did with Linux a decade earlier. As we all recognize, Linux has a huge presence in High performance computing today and given the trajectory that Open source tools like Hadoop or the approach that Lexis Nexis has recently taken open-sourcing their Big Data analysis tools, It is not difficult to fathom a time when Open source languages like R running on Eclipse would lead the Analytics space. The more interesting area is the confluence of Big Data/High performance Computing and Open-Source Analytics and how each could potentially leverage of each other to build a truly robust ecosystem a la the LAMP stack when it took Enterprise Architects by a storm.
Robert, thanks for sharing. I think you make an interesting point about old school vs. new school requirements and the birth of a new generation of R-savvy statisticians. Of course we see this sort of new knowledge base all the time regarding social media, and so, I suppose, this similar sort of young worker/new technology in the analytics discipline should come as no surprise. Worth exploring ...
Having hands on experience with SAS for more than 25 years (several SUGI presentations/publications), including using the unmatched capabilities of Enterprise Miner, I can attest to what it means to have access to a really solid analytics tool to count on.
When open source analytical tools became available I took a look at them and was not impressed with the capabilities, correctness, or stability of these applications. However within the past couple of years this has changed dramatically. Tools like R have developed an extensive following and many of the packages for R are fairly cutting edge when it comes to algorithms, etc. Support is, of course, only available in an ad-hoc fashion.
Most recently, tools such as Revolution R have been made available that leverage the R platform and add capability, and support (for a fee).
I have successfully used R/ Rattle, running in Eclipse (an open source IDE which is now used by more developers than any commercial IDE). This kind of setup is becoming the de-facto environment for many “Data Scientists” who need access to a standard toolset but also need to interface to a range of data sources, write simple integration code in languages such as Python, and create deployable (at no additional cost) tools.
The most obvious benefit to using the open source tools is the low initial cost. I have consulted with a number of start-ups where high-end analytics is part of their product or service, or required in the development phase (for example automated cyber-crime detection using data mining/neural net techniques). Ideally such firms would be able to use SAS Enterprise Miner from the very start, but they simply cannot afford the cost. While they may understand the shortcomings of using an open source platform, and that they may “pay for it later” when the tools they are using hit their maximum capacity, simple survival dictates minimal investment now, and if they get to the point where they have a viable product/service, etc, they will deal with it then.
Students are becoming increasingly exposed to tools such as R in the classroom and then continue to use it once employed. Back in the 80s when I got my Ph.D., SAS was the only real tool available to students. It was provided to the University at a much reduced cost and this created a generation of SAS literate statisticians. We are now seeing a new generation of R (and other open source tool) savvy statisticians.
Taking a look at job descriptions for open positions, one sees an interesting dichotomy. Start-ups and smaller high-tech firms require experience in R/Rattle, etc, as well as SQL and often a language or two such as Ruby/Python. “Old School” firms are more likely to require SAS experience.
The currently available open-source tools do work, and one can build robust tools with them. I recently created a tool for a client that calculates real-time expected success rates for an online cybercrime detection system. It accesses hundreds of thousands of records (stored on a Google cloud server). The heavy lifting analysis work is done in R and the integration with the web portal and data are in Python, and SQL with local storage in MySQL. The most interesting part being that I needed to do some Markov Chain modeling, which I am no expert in, and was able to find a plug-in (PEPA) for Eclipse. I must admit that I spent a good deal of time just getting the environment up and running. Version compatibility in open source tools is a critical problem, especially when mixing environments. In the end I had a stable tool that can be used (by a knowledgeable person) to generate estimates. This could all have been done, possibly more easily, in SAS but would have required a number of add-on interfaces for the DB and the Web portal, and in the end the tool could not be deployed without additional licensing or additional seats of SAS.
A great debate with very unique points on both sides. The argument for commercial analytics with strong customer support and specialized features is the one most people make about commercial solutions. The argument for open source as more customized, however, is not one you see everyday. I think what's missing here is discussion of the benefits of a customized commercial solution, customized cloud solution (if such a thing exists) and what happens when analytics is not a core function and you do not have the expertise to customize solutions yourself.
Randy Bartlett, author and seasoned analytics professional, will join us this Friday, May 17, at 2:00 p.m. ET for a radio show on ensuring organizational change for the good of business analytics.
LEADERS FROM THE BUSINESS AND IT COMMUNITIES DUEL OVER CRITICAL TECHNOLOGY ISSUES
The Current Discussion
Visual Analytics: Who Carries the Onus? The Issue: Data visualization is an up-and-coming technology for businesses that want to deliver analytical results in a visual way, enabling analysts the ability to spot patterns more easily and business users to absorb the insight at a glance and better understand what questions to ask of the data. But does it make more sense to train everybody to handle the visualization mandate or bring on visualization expertise? Our experts are divided on the question. The Speakers: Hyoun Park, Principal Analyst, Nucleus Research; Jonathan Schwabish, US Economist & Data Visualizer
To save this item to your list of favorite AllAnalytics content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.
Dynamic data visualizations let analysts and business users interact with the data, changing variables or drilling down into data points, and see results in a flash. Advance your use of data visualization with tools that support features like auto-charting, explanatory pop-ups, and mobile sharing.
No doubt your enterprise is amassing loads of data for fact-based decision-making. Hand in hand with all that data comes big computational requirements. Can traditional IT infrastructure handle the increasing number and complexity of your analytical work? Probably not, which is why you need a backend rethink. Big data calls for a high-performance analytics infrastructure, as Fern Halper, a partner at the IT consulting and research firm, Hurwitz & Associates, discusses here.
Redbox's bright-red DVD kiosks are all but ubiquitous these days, located in more than 28,000 spots across the country. Jayson Tipp, Redbox VP of Analytics and CRM, provides an insider's look at how the company has accomplished its phenomenal nine-year growth.
InterContinental Hotels Group (IHG), a seven-brand global hotelier, has woven analytics into the fabric of its operations. David Schmitt, director of performance strategy and planning, shares IHG's analytics story and his lessons learned.
Elizabeth Barth-Thacker, a BI and informatics technology manager at Humana, tells us how her team is creating data transparency and building engagement with the business – with the help of an internal collaboration portal called Humanalytics.
Speaking at SAS Global Forum Executive Conference, Rajeev Kaul, SVP of pricing at OfficeMax, uses a Chinese proverb to explain one of the reasons he's deploying SAS Visual Analytics.
In an All Analytics interview, Mike Cavaretta, technical leader, predictive analytics at Ford Research & Advanced Engineering, shares how big-data is fueling vehicle decisions.
Analytics professionals and SAS executives share how organizations can get on with their work so much faster when working in a high-performance and visual analytics environment.
Analytics professionals who attended SAS's recent Executive Briefing in New York share how they think visual analytics might help their organizations get better value from data.
At Boeing, effective decision making comes down to this simple formula: QxA=E, as executive Jerry Allyne explained at the recent INFORMS analytics conference.
Whether working in major league sports, financial services, or healthcare, analytics, and data, professionals are checking out how visual analytics and high-performance technologies can help them optimize their environments, shrink their cycle times, and improve decision making, as attendees at the recent SAS Executive Briefing in New York share with us.
SAS CEO Jim Goodnight speaks with us at a recent SAS Executive Briefing about getting a feel for what's in your big-data and other new realities powered by advanced analytics.
Jim Davis, SVP and CMO at SAS, talks with us at a recent SAS Executive Briefing about how high-performance analytics and visual analytics take away the concerns over big-data and let companies get down to business with their data.