- 10/26/2012 8:24:01 AM
@Beth, I don't doubt that many companies have desktops and laptops loaded with data driven projects that didn't come through the IT channels. Coming from the IT side of things I can't tell you how many times I've run across someone who says "check this out" and they show me an Access database on their laptop where they've converted years of their sales information although it's readily available from company provided sources. Data like this can add up quickly especially when you have multiple people doing the same thing.
- 10/26/2012 4:28:22 AM
Ah, but as the recipients of government grants/funding to preserve arts and literature for the cultural benefit of society, are not libraries "guardians of books"?
Similarly, in an organization, are not the data and analytics teams guardians of big data?
Just another way of looking at it.
- by mnorth, Blogger
- 10/25/2012 2:53:57 PM
@Joe: I like that analogy. Libraries aren't usually book hoarders. I wouldn't say they were unless they were using their collections to somehow gain an advantage over others or to make themselves feel more important. I've seen at least one instance where a library refused to ever get rid of holdings because of at least one librarian's strange belief that it was their role to be guardians of the books. As I've participated in this conversation, it's really helped me to process my thinking about data collection, storage and the concept of hoarding, which I now don't think has very much at all to do with the size of an organization's data.
- by BethSchultz, Blogger
- 10/25/2012 11:56:33 AM
@SaneIT, you've got me wondering whether companies might be finding themselves with a tons of accounted for unstructured data in their systems -- if not the corporate servers than on company-owned desktops. Social media analytics is still so newish, I would imagine a lot of one-off projects, under the radar sorts of things are taking place ....
- 10/25/2012 9:31:36 AM
@mnorth, yes management is key. Having a lot of data isn't necessarily bad but if you're keeping stale data that is easy to find again it makes no sense to spend as much on a place to hold it as the most important data that your company is using.
- 10/25/2012 9:24:52 AM
@Beth like any other data you'd set a policy for unstructured data. Hopefully the analysts are paying attention to when the data was imported and what it's relevant time frame should be. Then they can build a framework that covers how the data is likely to be used and how it fits into the HSM solution. Sure some historical data is going to be good forever but how often will you be accessing 10 year old numbers? Often enough that i should be running on your fastest storage space?
- 10/25/2012 7:41:42 AM
Context is also very important. Academics comes to mind. The Yale Center for Genomic Analysis, for instance, has a massive archive -- going back something like twenty years, I think. Lots of unnecessary data? Perhaps. But it's academics. To call it data hoarding in that context would be like calling a library a book hoarder.
- 10/25/2012 7:34:56 AM
If you're an organization that deals with large amounts of data, the problem (IMHO) is less about the storage itself and more about the data management and searching.
But then, if you have a budget and resources for skunkworks analytics projects, what's a few long queries to you?
How bad is your organization's "hoarding"? It's all a matter of scale. If the ROI is less clear on the data/analytics purposes you're keeping the data for, then simply ask yourself if you can afford the luxury.
- by mnorth, Blogger
- 10/24/2012 3:00:22 PM
@Beth: lkippen has done a nice job of differentiating the social/psychological matters of actual hoarding from the technical/implementation components of gathering, storing and accessing data (large or small). For the latter, I do not believe we replace, but perhaps we do supplement the EDW, and I like the former suggestion of HSM as a way to intentionally manage the transition from transactional system to analytical system to storage system. We probably should be having such discussions about storage architecture. Someone should write a blog post about that.... ;-)
- by BethSchultz, Blogger
- 10/24/2012 2:00:06 PM
@mnorth, we talk a lot about the changes big-data will bring to the enterprise data architecture -- i.e., the enterprise data warehouse and what becomes of it when big-data analytics becomes a corporate staple. Do we replace or supplement the EDW, for example. I wonder if we should be having the same sorts of discussions around the storage architecture?