Comments
An Infrastructure for Analytics
You must login to participate in this chat. Please login.

Are SSDs a valid alternative to in-memory databases?

Prospector

Thanks, Louis, It's been a pleasure and a lot of fun. Have a great holiday and a greater new year, everyone!!!! 

Prospector

Even with SSD, a terabyte can take a while to reload, of the order of 80 seconds or so. That's a long time in real-time analytics. ReRAM and other persistent DRAM will be very attractive to the analytics space.

Prospector

Thanks Jim for your time - lot's of good info.  Hope you can come back sometime.  I have to be going - Have a great day everyone !

Blogger

Yeah those reboots are a killer.  Thank Goodness for SSD's ! 

Blogger

One crucial hardware issue is the ability to change jobstreams, whether in house or in the cloud. I remember a National Lab that did 30 minute runs on a supercomputer, then took 45 minutes to remove one stream and load the next.

That's a killer in shared/multi-tenant systems.

Also, reboot after a crash is an issue. Fast SSD plus fast networks obviate these problems.

Prospector

Cognito...I mentioned Target, and Sony has just given us another example.

 

It may depend on the data stream. For instance, a stream reporting where you are in a store or what you are looking at...both currently ideas in use in retail....is only valuable in the context of the store and useful for just a couple of minute...maybe this isn't encrypted.

Generally, I agree with you and encryprting everything is a good idea.

Prospector

Paul R..What brand of flash

There are maybe five enterprise-class SSD vendors - Intel, Sandisk, Micron, Samsung and HGST. They all are into NVMe/PCIe drives. These are by far the fastest drives around, with IOPS as high as 3 million/second

Prospector

That's right Jim, thanks for the reminder, I sometimes forget the ultimate objective of Analytics - improve the bottome line !  Prove you can do that - and most will write a check or two.

Blogger

@Jim. Failure to safeguard data at rest and data in transmission is inexcusable in my environment. There are external critics who are inclined to believe the worst if there is any exigent issue – can't feed that beast.

Prospector

Hardware isn't cheap, but the objectives of any analytics setup are revenue generation. If real-time analytics increases Target's sales online by 15 percent, that would pay for a lot of gear!

Prospector

Really smart analytics guys are rare and worth every penny. I suspect we'll see AaaS companies built around one or two of them.

Prospector

@Cognito  It really does present a real challenge, just that much more to safeguard.

Blogger

Secuirity with big data is a big challenge. Real time data flows make encryption difficult, and that's also true of data on those local SSDs. Networked storage should still be encrypted.

Security may get left behind in the str4uggel for raw performance, but the crown jewels of a company may flow through these systems nightly6

Prospector

@Jim   Wow !   Those minors are suprising to hear as well as the recommendation for a PhD in the area. 

Blogger

@Louis Security and Anlytics are two horns of a dilemma for me.

Prospector

Loius, If I were starting over, I'd look for a school that offered a degree in CS with psych, marketing  and communications minors! Then I'd get a PhD there. 

Experts in this area command huge salaries.

Prospector

@Jim   How does security impact this process ?  Is there more to it than just slowing down the process a bit ?

Blogger

Cognito,

vertigo is an occupational hazard in Analytics. It reminds me of a bunch of spin doctors all chanting their spells!

Figuring a direction isn't trivial. My suggestion is to do mainstream work like Hadoop inhouse and job out the esoteric stuff. Then over time, focus on those things that work best for you and build inhouse capability.

Many of the ideas in the market are small scale and academic at this time, and sorting winners isn't easy.

Prospector

@Jim re: Cost of People   That is a key point, there are some very high end techniques being used in these systems and companies will have to be prepaired to pay for this skillset, though I get the feeling they won't regardless.

Blogger

Hardware and people are not cheap. One solution for smaller companies is to use an Analytics-as-a-Service provider, and get expert help on a pay as you go basis.

Prospector

David,

The affordability question is two fold. One part is the cost of hardware and the other is the cost of people to program and use it

Prospector

Sorry,

I had a bit of trouble with log in!

 

Prospector

So one thing I am curious about is how many companies can actually afford analytics infrastructure on the budgets they actually have. Are we spending enough? Not enough? Too much?

Editor

@David   Sure was.   Lot's great information and food for thought.

Blogger

I'll have to assume the end was brilliant and catch it on the archive.

Editor

Sporry folks. Lost audio at the end. My connection got wonky

Editor

Jim, thanks for being a great guest! Lots of super information to chew on...

Prospector

Great question Curt !

Blogger

Thanks Jim and Curtis !

Blogger

@Cognito     I feel your pain.

Blogger

Vendors like Cloudera, Hadoop, Oracle (Endeca, Exalytics, Visual Analyzer, Tableau have all told me they have the solution. I have a number of legacy mainframe applications plus Oracle (and Business Objects) , Citrix, plus standalone Foxbase and SQL and Access databases. My head spins when I try to integrate these coherently from a business intellegence/analytics POV.

Prospector

@David      In the new frontier of computing, it is as Mr. O' Reily mentions - it works best with how we can optimize computing, but that is not to say it is the only way or that we have yet to really realize the optimization of data movement and processing.   I would think so, but I know better than that.

Blogger

@Curt    Ah yes, those open source schedulers do take a lot of time to customize.

Blogger

Lots of people seem to be making it the solution.

Editor

Everyone seems to be sold on Hadoop, interesting question is it really the solution ?   IBM probably thinks otherwise.

Blogger

@Cognito, that's a great question! Hold that thought: Jim will be joining the online discussion in about 15 minutes and I think he'll be able to provide some detail in his response.

Prospector

@Louis, the two big expenses are environmental controls (GPUs throw off literally tons of heat) and the scheduling software that knits them into a total system. There are some open-source frameworks but it's still an expensive proposition to do the development that makes the scheduler work really well.

Prospector

I'm looking at establishing analytical capability. I've been instructied to build an enterprise data warehouse with data marts. I don't need real-time analysis (hourly at most, daily more likely) but I do want to be able to construct new analysis and models using historical data. Is there a better architecture for the system out of the five: (1) independent data marts, (2) data mart bus architecture with linked dimensional data marts, (3) hub and spoke, (4) centralized data warehouse (no dependent data marts) and (5) federated? More importantly, is there a cloud solution that allows exisiting legacy systems to "feed" an web-based dashboard as well as query tool?

Prospector

Nvidia has always been a leader in GPU's from what I remember....no longer actively engaged with them any longer.

Blogger

@David   Very true.  Some companies are starting to think about that power bill - using some mix of solar with traditonal energy.

Blogger

I like  the GPU methodology but it is still really expensive from what I remember.

Blogger

Big data, big power bills.

Editor

I've been reading about potential digestable computers to monitor body temp, Curt. Gotta say that's a little too far for me. I want mine inserted in my brain or something that feels more permanent.

Editor

I'm cold-natured: I'd be thrilled with thermostats that could take MY core temperature into account.

Prospector

What brand of Flash Storage is being used the most 

Prospector

Might as well call it the internet of themostats right now.

Editor

Thanks for being frank about the IoT. I feel the same way about the IoT as i do about my jet car. Where is it?

Editor

I would think the Cloud would be best for the budget.

Blogger

The first question that comes to mind is whether I should create my own infrastructure or put it in the cloud.

Editor

Nice to hear Curt on A2. Also, nice to hear Jim again. He's done great shows on other sites i've been with.

Editor

Audio is about to start. If you have troubles, try refreshing your browser. Some people need to use certain browsers to get the audio to work depending on their combination of security and browser.

Editor

Jim's in the studio -- I think we're going to have a great show!

Prospector

hi all, about 5 minutes to show time.

Editor

We'd love to have your voice in the discussion here. To take part, just type your comment into the "Your Post" box and then click on the "Post" button below the box. Feel free to introduce yourself before the show starts -- I think you'll find that we're a very friendly community here! 

Prospector

Hey, everyone, we're glad you could join us! When the show is scheduled to start, an audio player should appear above the "Your Post" window. If it doesn't appear, you might need to refresh your browser until it does. If it appears but doesn't start playing, then you may need to click on the "play" button on the far left of the player. 

Prospector


INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +