Tons of medical possibilities. Taking privacy aside for a moment, all of the sensor readings, tests, etc that are tracked these days can yield a treasure trove of insight into disease patterns, treatment effectiveness, etc. Medical applications could be some of the most beneficial uses of big data over time
In the context of medical data, we are so used to acquire, store, retrieve and analyze "coded data". Big Data does not fit too well with such a model. Do you see any Analytics application in medical data.
Example from my field (urban mass transit) ... Our transit authority (which I helped create) bought video cams for all its buses. This is a powerful data-gathering tool. However, because of cost, it has so far activated only a fraction of these.
I might point out that a lot of passengers think they're being recorded and their conversations monitored, whereas only a minority of buses actually have active cameras. Must be some kind of moral there somewhere...
riteshpatel - ETL gets to the heart of the challenge with big data. The MapReduce framework is perhaps at its best in doing the complex ETL required to extract valuable, structured information from big data. It helps scale that process so you can extract the pieces you need to flow into traditional reports and analysis
Lyndon - you hit on a big one...Privacy. It is a huge issue with big data. There have been news stories almost every week about some new privacy issue that has arisen. this will be a very important topic as big data evolves. Security of data will only become more important
Speaking of the sampling issue and Big Data ... The technological capability seems to be there to gather vast amounts of useful data, but implementing this technology can get expensive. What seems to drive a lot of the investment is marketing needs (private sector) and security (public and private). These provide a rationale for allocating heavy investment into the resources needed to collect, digest, and analyze the data.
This all then triggers the privacy etc. issues that have a lot of the public spooked...
The reason for my retail examples is much less meaningful that it would. I have a soft spot for retail and have done more work in the industry over time than in others. So, just top of mind. In fact, I would classify retail as somewhat behind the curve with big data in most cases.
Yes, I have seen innovation centers in practice. In many ways, the role of "data scientists" in many of the technology businesses are effectively running innovation centers, thought they don't call it that. They are tasked with finding new and interesting ways to use data to help the business.
I have also seen "old school" businesses adopt it. One grocery store chain set up a team to try all sorts of new customer, assortment, and forecasting analtyics.
riteshpatel - I think you've hit on a key point. Most of the visualization tools only scale so far as of today. You can't necessarily visualize all 5TB of your data. We're probably back to our other discussin thread...you may have to do some sampling to use some of the tools today. For many questions, this will be just fine.
Can you give example of such visualzation tool. As a team we have tried OBIEE, Tableau, Microstrategy, inhouse built application but everything fails after certain point. The reason is what exactly you mentioned - Inflow is so high that application gets absolute before realizing value of existing solution.
Beth - an innovation center is a mix of people and tools with some expedited processes. The way it usually works is that some system resources are dedicated to the innovative tasks. And, if not a few full time people, some set percentage of some people's time. The key is to put some formal priority on the experimentation and innovation in analytics. Many companies have R&D departments for their products, why not also for analytics?
Bill, to tag team on Lyndon's question, does big-data analytics require a different sort of analytics professional that we've known to data, in terms of technical skills and, perhaps, business knowledge?
riteshpatel. There are numerous visualization tools out there and they are getting better every day. Many have started allowing in-memory analytics as well. I think business people don't much care about cloud vs server vs pc tools. They just want something that meets their needs. There are tools out there that are doing some neat things with visualization
>>Another approach is what I call an "innovation center". This is where you have people tasked with exploring data proactively and experimenting to see what analytics it can drive. The idea is to let the experimentation drive the requirements.<<
Any examples of how this is implemented in actual practice? Where is such an innovation center in operation in an actual organization? Personnel dedicated to it, or just an adjunct to other tasks?
I am really wondering if there is any tool in industry that can allow to do powerful visualization on any kind of raw data. Be it sales data, or web traffic or spending. Different business group wants to view it differently.
Best way is to have visualization tool that can run securely in the cloud and allow capability of desktop tool.
RKA - good question on sampling. Sampling can still be relevant in the world of big data. As always, it depends on the scope of your analysis. If you just want to know what percent of customers visit a certain part of a website, a random sample of customer sessions can work just fine.
In fact, I think there are times when not sampling just causes a lot of extra processing for no extra benefit.
Computers getting faster, with huge storage and memory features, lot of what used to be considered a large dataset can be handled very easily internally. Excel used to have a limitation of 256 columns, but not anymore - more like 64k or so.
So, probably one has to clearly cross the terabyte (at least) level before claiming to be "Big".
The other day I heard someon erefer to zetabytes or so. Must be a milliion terabytes/
Beth, yes it is possible for projects to be small in scope, yet require a lot of data.
I know of a retailer who simply identified people who browsed products but didn't buy. that required processing through a lot of data, but the actual analytics and mechanics were simple. They got a huge ROI.
Bill, you responded to bkbeverly by saying that big-data might require analysts to think differently. But does big-data necessarily mean big project? I fear sometimes that that's the inference when it doesn't need to be. So I guess another way of asking this is, Can big-data projects be small in scope?"
The definition of big data is something that is constantly debated. The most widely accepted definitions are that big data is something bigger than your current/traditional tools can handle well today. So, you either have to upgrade to more of the same and/or add new tools to the mix. So, what is big data in one industry or for one company may not be big for another
Certainly some of these sources are much more granular than the past. Take that sensor data I discussed. We are talking data at millisecond or less levels. Far different from weekly break/fix reports. So, I do think some of this data requires thinking differently.
I think the key to resolving conflicts is to think through how the various data sources can be connected effectively. Perhaps not in all cases they can
Pardon me if I am being too elementary. Just the sheer size of the data doesn't qualify it as "Big Data" - would it? We've been collecting EEG data with 64 channels, then 128, som with 256, at 10khz, and after a fre hours, it becomes huge, but still quite homogeneous. So it won't be considered "Big Data" - right?
Bill, back to my infrastructure question, you say it may be beneficial to get a MapReduce environment -- what are some ways a company can recognize it's at the stage where it needs to start thinking about this?
From the standpoint of temporal consistency, what assumptions do you make about big data? Does it all represent the same time period? The context of the question is that I think with big data, we are trading time for space. We acquire a lot of rich substance, but I wonder if there are major variations in the time periods that big data represent. If that is the case then you are trying to match synchronic dynamics with diachronic dynamics - basically trying to find relationships between data elements that are temporally out of sync. Thoughts?
Back to my earlier point that being big or being unstructured doesn't inherently say anything about value. Similarly, internal or external doesn't inherently matter. If there is a valuable external data source you can get your hands on and it improves your analytics, have at it!
Another approach is what I call an "innovation center". This is where you have people tasked with exploring data proactively and experimenting to see what analytics it can drive. The idea is to let the experimentation drive the requirements. You don't have it figured out up front, so you experiment as a starting point.
So it's one thing to talk about this in theory, but what about in terms of actual implementation. If you've got all this great new information/data at your disposal does that mean you need to rethink what you have in place for analyzing it? In other words, won't big-data analytics tax most existing infrastructures?
In the past, they would only have seen when things break and then tried to see if they could figure out why. The sensor data allows them to see much more cleanly. And, to identify early warning indicators to be more proactive. That can change everything
Back to the point of having a new set of information that you didn't have in the past. That can only improve your forecasting and planning processes. Just imagine how much better manufacturers of machinery and engines can assess the lifecycles of their engines with the masses of sensor data on pressure, temperature, etc throughout the lifecycle. In the past...
Bill, I wanted to be sure you'd seen the question from WaltDitto: Bill, In what ways are you seeing big data expand the frontiers of what organizations are capable of doing? In particular, how is big data pushing back the horizons of planning and forecasting best practices?
What is really important is what you will do with big data to drive value. that's true of any data. The fact it is big or unstructured really doesn't matter when it comes to deciding if you need to use the data and what value it will drive. It only matters to the extent that it impacts what tools and techniques you may have to use. But the important decision is that the data has value or not
Yes. Big data isn't just unstructured data. Much of it fits that definition, but not all of it. And, not all unstructured data is big data. In fact, I think too much emphasis is being put on the defintion of big data lately. Let me explain...
Looking at browsing history, for example, takes us beyond just what a customer bought and what offers they replied to. We can now see how they shop and what they are thinking of buying. That makes analytics much more powerful and predictive. Similarly, a lot of the sensor data is new information.
Bill, In what ways are you seeing big data expand the frontiers of what organizations are capable of doing? In particular, how is big data pushing back the horizons of planning and forecasting best practices?
As an analytical professional, I have always wanted to get all the data I could in order to address a given problem. I now have to add big data to the mix. It may require some extra work in some cases, but the goal is still to extract meaningful insights from it.
Well, certainly big data does provide some challenges. Some new tools and approaches are required to handle it. But, many of the same underlying analytics principles still apply fully to big data. For example...
The last decade or so, I've been focused on very large companies and how they do analytics. Big data is another wave of challenges for companies. So, I wanted to tie the big data trend into some of the general analytics trends we've had the past decade or two. There really are some commonalities