Recent news stories about big-data have pumped up the idea that gargantuan datasets guarantee value. In fact, they don't.
Prospective software buyers are often concerned about dealing with large quantities of data. I've heard a lot of questions along the lines of "We have a 12-oodlebyte database; can the software handle that?" No matter how much data you have, there is software to handle it, but if this is the kind of question you're asking, you're doing it wrong.
Very large datasets always demand more resources than small ones. If you collect lots and lots of data, you will certainly need more hardware, pricier software licenses, and additional labor to manage it, let alone analyze it. Is the extra expense worth it? Often, it is not. The problem is that it's easy to get so wound up over the size of the data that you lose perspective.
One area where I have seen this come up time and time again is social media analytics. Sentiment analysis is relatively new and very popular; everyone would like to know what people are saying about brands online. So many service providers are collecting tweets and other social media mentions, assessing them for sentiment (positive, neutral, or negative), and providing a summary for the client. Many use automated software to analyze every single mention they can get their virtual hands on -- sometimes millions of mentions to produce a single summary.
What's wrong with that? Think about the goal first, and then take a step back. The end goal is, say, a pie chart showing how many mentions are positive, how many are neutral, and how many are negative. Sentiment analysis, at its best, is not very precise, so the numbers cannot reasonably be expected to offer anything beyond a rough approximation. Another weakness of this process: The automated tools that select the relevant mentions are less than perfect. You could get your pie chart from a sample of a few hundred cases, and the results would not be significantly different from what you'd get analyzing of millions of mentions. If you insist on using much more data than necessary to meet the goal, you're making the job harder than it needs to be, and you're pouring money down the drain.
Sometimes you really need big-data, and sometimes you really don't. When you need to understand the behavior of many people as individuals -- perhaps as a direct marketer preparing personalized product recommendations, or a campaign manager using a similar approach in politics -- you need details about each person in a group of millions, and that adds up. If you are interested in summarizing the behavior of groups of people or things, a small dataset or sample is enough. Small data has advantages. Since the quantity isn't overwhelming, you can do a much better job of finding and correcting any data quality problems early. And it is easier and cheaper to analyze small datasets.
Using small data when it makes sense will help you get the job done and still have some money left for the next project. Do you agree?
@ SaneIT, I'm sure it is mostly cat videos. I have a friend who's cat has his own Facebook page. And that is not uncommon.
So I guess the question, what is the source of most of this data. Is it financial institutions or Yelp? What is really relevant? Are all these cat videos driving up storage costs. (And yes, I watch cat videos.)
That's a sobering thought especially when you realize someone has to manage that data. As the data sets grow it's just going to be more important that the quality of data rises otherwise you'll never be able to make sense of what data you do have.
I agree that knowing what you're looking for is more important than how much data you have. I see the difference in a big database with good people pulling out good data as a smooth running warehouse. They know where each part is and they know how to get to it in the most efficient way plus they know how those pieces go together so they know right away if something looks off. I see the big database without thought like an episode of the show Hoarders. There are just big piles of stuff everywhere and even though they aren't quite sure what they have or where they have it they can't get rid of anything because they might want it later.
If the goal is to summarize the sentiment expressed across a large group of messages, then I agree that using a sample and manually assessing sentiment in each is often a good way to go. There are limitations to that approach, though.
If you have many such groups of messages to evaluate, then time is a limiting factor. Sentiment categorization often uses a complex set of rules, and many organizations have had bad experiences using junior staff or outside vendors to perform categorization, due to lack of experience and high turnover of these analysts. To be fair, lack of training is also an issue. Not every organization is willing to invest in training analysts, either. So, for speed and consistency, it may be more effective to use automated techniques.
In sentiment analysis, I think you've chosen an especially good example of the weakness of large data sets. Sometimes, maybe even often, reading the full Tweet leads a human to a different sentiment conclusion than the software finds by just parsing words.
Maybe it would be better to randomly select a couple hundred Tweets and just read them all? Or select a thousand and have a (paid) intern read them all?
Diego Klabjan, chair of the INFORMS University Analytics Program Committee and program director for Northwestern University's Master of Science in Analytics program, gives his advice for figuring out where to get an advanced analytics degree.
What Works: Open Source Analytics Software International Institute for Analytics WebinarOn Wednesday, Sept. 24, join IIA CEO and Co-Founder Jack Phillips, along with featured guest Gary Spakes, as we explore the five modernization stages that analytics hardware/software have experienced. We will discuss the considerations when calculating total cost of ownership of the analytics ecosystem.
2014 VA Interactive Roadshow -- Cary, NCThe 2014 VA Interactive Roadshow will feature SASŪ Data Management and SASŪ Visual Analytics experts covering topics like prepping data for VA and VA integration with SASŪ Office Analytics. This year's events will keep presentations at a minimum and focus on giving attendees hands-on exposure to the latest version of VA.
Essential Practice Skills for Analytics Professionals Drawing on best practices from the field, this INFORMS course helps analytics professionals add value from beginning to end: listening to clients, framing the central problem, scoping a project, defining metrics for success, creating a work plan, assembling data and expert sources, selecting modeling approaches, validating and verifying analytical results, communicating and presenting results to clients, driving organizational change, and assessing impact.
Analytics 2014 The Analytics 2014 Conference is a two-day, educational event for anyone who is serious about analytics. This annual event brings together hundreds of professionals, industry experts and leading researchers in the field of analytics. All Analytics members save $500 on conference fees by using promo code ACAA.
Premier Business Leadership Series 2014 The Premier Business Leadership Series is an exclusive event for senior executives and decision makers that focuses on solving the current issues that affect governments and businesses globally. The Series is a unique learning and networking experience focused on the most innovative leadership strategies and analytic solutions for competing in todayâs global economy.
2014 VA Interactive Roadshow -- BostonThe 2014 VA Interactive Roadshow will feature SASŪ Data Management and SASŪ Visual Analytics experts covering topics like prepping data for VA and VA integration with SASŪ Office Analytics. This year's events will keep presentations at a minimum and focus on giving attendees hands-on exposure to the latest version of VA.
Data Exploration & Visualization Get hands-on training that focuses on the critical steps in the process of analyzing data: accessing and extracting data, cleaning and preparing data, exploring and visualizing data. This INFORMS course will use several of the most popular software tools intensively, and provide an overview of the range of software options.
Foundations of Modern Predictive Analytics In this INFORMS course, learn about modern predictive analytics, the science of discovering and exploiting complex data relationships. This course will give participants hands-on practice in handling real data types, real business problems and practical methods for delivering business-useful results.
2014 VA Interactive Roadshow -- AtlantaThe 2014 VA Interactive Roadshow will feature SASŪ Data Management and SASŪ Visual Analytics experts covering topics like prepping data for VA and VA integration with SASŪ Office Analytics. This year's events will keep presentations at a minimum and focus on giving attendees hands-on exposure to the latest version of VA.
LEADERS FROM THE BUSINESS AND IT COMMUNITIES DUEL OVER CRITICAL TECHNOLOGY ISSUES
The Current Discussion
Visual Analytics: Who Carries the Onus? The Issue: Data visualization is an up-and-coming technology for businesses that want to deliver analytical results in a visual way, enabling analysts the ability to spot patterns more easily and business users to absorb the insight at a glance and better understand what questions to ask of the data. But does it make more sense to train everybody to handle the visualization mandate or bring on visualization expertise? Our experts are divided on the question. The Speakers: Hyoun Park, Principal Analyst, Nucleus Research; Jonathan Schwabish, US Economist & Data Visualizer
The hospitality industry gathers massive amounts of customer data, and mining that data effectively can yield tremendous results in terms of improved CRM, better-targeted marketing spend, and more efficient back-end processes. Roger Ares, vice president of analytics at Hyatt Corp., discusses the ways he and his staff use big data.
Charged with keeping track of travel assets, including employees, iJET International relies on data management best-practices and advanced analytics to keep its clients in the know on current and potential world events affecting travel, Rich Murnane, Director of Enterprise Data Operations & Data Architect, told All Analytics in an interview from the 2014 SAS Global Forum Executive Conference.
Jason Dorsey, chief strategy officer for the Center for Generational Kinetics and keynote speaker at last month's SAS Global Forum 2014, describes how Gen Y professionals are enhancing the makeup of multigenerational analytics organizations.
From analytics talent development to the power of visual analytics, All Analytics found a variety of common themes circulating throughout the exhibition floor and session discussions at the 2014 SAS Global Forum and SAS Global Forum Executive Conference events held last month in Washington, DC.
Talking with All Analytics live from the 2014 SAS Global Forum Executive Conference, Eric Helmer, senior manager of campaign design and execution for T-Mobile, discussed the importance of customer data -- starting internally -- in devising the mobile operator's marketing plans.
The big-data analytics market can be a confusing place. Among the vendors vying for your dollars are traditional database management providers, Hadoop startup services, and IT giants. In this video, All Analytics editors Beth Schultz and Michael Steinhart sit down in a Google+ Hangout on Air with Doug Henschen, executive editor of InformationWeek. Henschen discusses use cases for big-data analytics, purchase considerations, and his recent roundup of the top 16 big-data analytics platforms.
At the National Retail Federation BIG Show last month, All Analytics executive editor Michael Steinhart noted a host of solutions for tracking and analyzing customer activity in retail stores. From Bluetooth beacons to RFID tags to NFC connections to video analytics, retailers must find the right combination of tools to help optimize the shopper experience, streamline operations, and boost revenues.
The days when historical shipment trends and gut feelings were enough to forecast retail demand accurately are long over. SAS chief industry consultant Charles Chase outlines the benefits of pulling real-time sales information from point-of-sale and product scanner systems, then flowing that data into dynamic forecasting tools from SAS.
With today's advanced visual analytics tools, you can stream data into memory for real-time processing, provide users the ability to explore and manipulate the data, and bring your data to life for the business.
Dynamic data visualizations let analysts and business users interact with the data, changing variables or drilling down into data points, and see results in a flash. Advance your use of data visualization with tools that support features like auto-charting, explanatory pop-ups, and mobile sharing.
No doubt your enterprise is amassing loads of data for fact-based decision-making. Hand in hand with all that data comes big computational requirements. Can traditional IT infrastructure handle the increasing number and complexity of your analytical work? Probably not, which is why you need a backend rethink. Big data calls for a high-performance analytics infrastructure, as Fern Halper, a partner at the IT consulting and research firm, Hurwitz & Associates, discusses here.
Redbox's bright-red DVD kiosks are all but ubiquitous these days, located in more than 28,000 spots across the country. Jayson Tipp, Redbox VP of Analytics and CRM, provides an insider's look at how the company has accomplished its phenomenal nine-year growth.
InterContinental Hotels Group (IHG), a seven-brand global hotelier, has woven analytics into the fabric of its operations. David Schmitt, director of performance strategy and planning, shares IHG's analytics story and his lessons learned.