With so much data in the world, it’s time to figure out how much of it is unstructured -- that which a human needs to look at in order to understand it best -- and what to do about it.
IT research firm IDC has estimated 7.9 zettabytes of digital data in the world by 2015, and I think the biggest chunk of it will come from social media generated on mobile platforms and driven from email. Intel estimates that at least 2.5 billion people will be online by 2015, generating more and more data and requiring more resources for storing and processing the data, as reported in InformationWeek. Such outlooks have led evangelist analysts to gush over unstructured data's potential; Google's Avinash Kaushik, for example, publicly claimed to have “orgasms over big, unstructured data.”
How to structure unstructured data
Many of us are really only getting started with unstructured data, looking for ways to get started and trying to figure out how best to handle it all. Actually, we need to ask ourselves if we should even bother to try working with it, as many previous attempts to add structure robotically have been disappointing to say the least, and fail at least much of the time. After all, dealing with and automating processes around structured data is tough enough!
Here I've put together a few things you can do with or relative to unstructured data:
- Distribute the data in the cloud -- just store more of it and hope you can see useful patterns in the data with advanced big-data analytics and predictive analytics platforms.
- Develop more powerful analytics engines to analyze the data, most of which will be in the cloud, in real time
- Transforming dark data/dark social and ultraviolet data into useable, structured information from which you can gain insights, as I discussed in my post Putting Analytics Fragmentation Into Perspective.
- Merge as much data as you can into large data files, a lesson learned by Team Obama in preparing for the 2012 election recently; merging several different databases and cleaning the data made developing predictions and gleaning insights easier.
- Clean the data -- this assumes unstructured data is dirty, or not useful for analysis in its current state. You can purge duplicate information, ensure consistency in the naming of entities, and empty and sparse datasets, for example. Consider checking out Saleforce Data.com's Social Key, which ties customer data records to social media accounts and online content by those accounts. Perhaps Salesforce is on to something here; the cost of cleaning data might be shared, as the data a company is able to clean for its own use also goes back (the part that can be shared) to the overall Data.com repository in Salesforce’s cloud.
Working with unstructured data won't be easy -- but it will be necessary.
What advice do you have for working with unstructured data? Share below.