Even with SSD, a terabyte can take a while to reload, of the order of 80 seconds or so. That's a long time in real-time analytics. ReRAM and other persistent DRAM will be very attractive to the analytics space.
One crucial hardware issue is the ability to change jobstreams, whether in house or in the cloud. I remember a National Lab that did 30 minute runs on a supercomputer, then took 45 minutes to remove one stream and load the next.
That's a killer in shared/multi-tenant systems.
Also, reboot after a crash is an issue. Fast SSD plus fast networks obviate these problems.
Cognito...I mentioned Target, and Sony has just given us another example.
It may depend on the data stream. For instance, a stream reporting where you are in a store or what you are looking at...both currently ideas in use in retail....is only valuable in the context of the store and useful for just a couple of minute...maybe this isn't encrypted.
Generally, I agree with you and encryprting everything is a good idea.
There are maybe five enterprise-class SSD vendors - Intel, Sandisk, Micron, Samsung and HGST. They all are into NVMe/PCIe drives. These are by far the fastest drives around, with IOPS as high as 3 million/second
@Jim. Failure to safeguard data at rest and data in transmission is inexcusable in my environment. There are external critics who are inclined to believe the worst if there is any exigent issue – can't feed that beast.
vertigo is an occupational hazard in Analytics. It reminds me of a bunch of spin doctors all chanting their spells!
Figuring a direction isn't trivial. My suggestion is to do mainstream work like Hadoop inhouse and job out the esoteric stuff. Then over time, focus on those things that work best for you and build inhouse capability.
Many of the ideas in the market are small scale and academic at this time, and sorting winners isn't easy.
@Jim re: Cost of People That is a key point, there are some very high end techniques being used in these systems and companies will have to be prepaired to pay for this skillset, though I get the feeling they won't regardless.
Vendors like Cloudera, Hadoop, Oracle (Endeca, Exalytics, Visual Analyzer, Tableau have all told me they have the solution. I have a number of legacy mainframe applications plus Oracle (and Business Objects) , Citrix, plus standalone Foxbase and SQL and Access databases. My head spins when I try to integrate these coherently from a business intellegence/analytics POV.
@David In the new frontier of computing, it is as Mr. O' Reily mentions - it works best with how we can optimize computing, but that is not to say it is the only way or that we have yet to really realize the optimization of data movement and processing. I would think so, but I know better than that.
@Louis, the two big expenses are environmental controls (GPUs throw off literally tons of heat) and the scheduling software that knits them into a total system. There are some open-source frameworks but it's still an expensive proposition to do the development that makes the scheduler work really well.
I'm looking at establishing analytical capability. I've been instructied to build an enterprise data warehouse with data marts. I don't need real-time analysis (hourly at most, daily more likely) but I do want to be able to construct new analysis and models using historical data. Is there a better architecture for the system out of the five: (1) independent data marts, (2) data mart bus architecture with linked dimensional data marts, (3) hub and spoke, (4) centralized data warehouse (no dependent data marts) and (5) federated? More importantly, is there a cloud solution that allows exisiting legacy systems to "feed" an web-based dashboard as well as query tool?
We'd love to have your voice in the discussion here. To take part, just type your comment into the "Your Post" box and then click on the "Post" button below the box. Feel free to introduce yourself before the show starts -- I think you'll find that we're a very friendly community here!
Hey, everyone, we're glad you could join us! When the show is scheduled to start, an audio player should appear above the "Your Post" window. If it doesn't appear, you might need to refresh your browser until it does. If it appears but doesn't start playing, then you may need to click on the "play" button on the far left of the player.