Noreen Seebacher

The Daunting Task of Defining Big-Data

NO RATINGS
View Comments: Oldest First | Newest First | Threaded View
Page 1 / 2   >   >>
Ariella
User Rank
Blogger
lotta data
Ariella   11/15/2012 10:32:35 AM
NO RATINGS
I rather like the expression, "we got a whole lotta data." It almost sounds like a song.

Noreen Seebacher
User Rank
Blogger
Re: lotta data
Noreen Seebacher   11/15/2012 10:42:22 AM
NO RATINGS
LOL Ariella!

Alexis
User Rank
Data Doctor
New Age
Alexis   11/15/2012 1:52:20 PM
NO RATINGS
Florissi predicted the era of big-data we will trigger changes similar to the ones that occurred during the Industrial Revolution,

It's really amazing to think of the changes big-data could usher in -- comparing it to the industrial revolution is mind-boggling!

mnorth
User Rank
Blogger
It's all relative
mnorth   11/15/2012 3:47:18 PM
NO RATINGS
@Noreen: Excellent topic.  Defining big data is something I pondered on while penning my recent musings on the definitions of other industry terms.  I like the statement about 'big data' being relative.  It's kind of like calling something 'new'.  Well, it's new now, but for how long?  And when it's not anymore, will we take the new off?  In my first analytics job, I worked on a SAS server that had 800 GB of disk space.  We thought that was a ton.  Now you can buy mulitple TB drives off the shelf at Staples.  Is 800 GB still 'big'?  Nope.  Exa-, zetta- and yotta- are big, for now, but for how long?

kicheko
User Rank
Blogger
Re: lotta data
kicheko   11/15/2012 4:35:55 PM
NO RATINGS
Ariella, - So true though what he says. Big-data is just lots of data in different terms. Funny though how as soon as someone found the shortest word to refer to lots of data, it became complicated. I believe that most of the difficulty in explaining big-data is in actual sense difficulty in understanding the tools of handling big-data and their dynamics

Ariella
User Rank
Blogger
Re: lotta data
Ariella   11/15/2012 4:41:58 PM
NO RATINGS
@kicheko, true.  A short description packs a lot of complications here.

Noreen Seebacher
User Rank
Blogger
Re: lotta data
Noreen Seebacher   11/16/2012 8:35:41 AM
NO RATINGS
Let's here some of your own definitions. If you had to make a dictionary entry, what would you write about big-data?

philsimon
User Rank
Data Doctor
the anti-definition
philsimon   11/16/2012 12:02:14 PM
NO RATINGS
1 saves
I like this one from The Register:

Big Data is any data that doesn't fit well into tables and that generally responds poorly to manipulation by Structured Query Language (SQL).

[T]he most important feature of Big Data is its structure, with different classes of Big Data having very different structures.

With that definition, we can start to look at examples. A Twitter feed is Big Data; the census isn't. Images, graphical traces, Call Detail Records  (CDRs) from telecoms companies, web logs, social data, RFID output can all be Big Data. Lists of your employees, customers, products are not.




Alexis
User Rank
Data Doctor
Re: the anti-definition
Alexis   11/16/2012 1:07:54 PM
NO RATINGS
Nice one Phil! It makes the distinction very clear.

scorellis
User Rank
Prospector
Re: lotta data
scorellis   11/18/2012 2:24:30 AM
NO RATINGS
We have spent the past yeart trying to define "big data."  In the end, it comes down to a couple of things.  A) people like to think they have big data, even when they don't.  Here is why I say this.  If you have a 32 GB 8 core SQL server and you have a database table with 250 GB of data in it. yeah, you might think you have big data.  If you have 900 GB of data, and 512 GB of RAM on your database server, you might think you have "big data." No, you don't,  B) If you have a 6TB table of data in your DB, and you have 2 TB of RAM, and you think you have big data, YES. YOU DO.  Well, at least according to my definition...my definition, constrained by current techological limitations, goes like this: If you have to make a call to disk for data, and the amount of time it takes to return that data to your users exceeds the amount of time that users are willing to accept, AND the ONLY WAY to alleviate this issue is to increase your bandwidth to disk because you have ALREADY MAX'D out the technologically available amount of RAM within which the database could reside, then you have "BIG DATA."   I hope this clarifies things. My team deals with this on a daily basis from some rather gigantic companies with rather large amounts of corpuscular data.  By "corpuscular," I mean single, extremely large chunks of data that can not be fragmented across multiple systems and MUST be treated symptomatically.  That is, the data is interrelated, and the users seek to examine it as a whole, and not as discrete entities. I really could go on and on about this....

Page 1 / 2   >   >>
Information Resources
More Blogs from Noreen Seebacher
Everyone is talking about big-data as an HR solution, so why doesn't it seem like we're any closer to solving the people problem?
Even at a trade fair better known for seminars on information technology, big-data was too significant to ignore.
All Analytics readers have serious issues with the data hidden in digital photos.
The system we use to select American courtroom juries is riddled with delays, frustrations, and inefficiencies.
It was actually a little too easy to slip a cellphone past security at a federal courthouse.
Radio Show
Radio Shows
UPCOMING
James M. Connolly
Analytics: Your Defense Against Cyber Threats


5/27/2015   REGISTER   0
ARCHIVE
James M. Connolly
Live Interviews From SAS Global Forum


4/28/2015  LISTEN   11
ARCHIVE
James M. Connolly
How to Hire Great Analytics Talent


4/23/2015  LISTEN   51
ARCHIVE
James M. Connolly
Sports Analytics Mean Fun and Business


3/24/2015  LISTEN   4
ARCHIVE
James M. Connolly
Secure Your Big Data in the Cloud


2/26/2015  LISTEN   114
ARCHIVE
James M. Connolly
Make It Big As a Data Scientist in 2015


2/11/2015  LISTEN   106
ARCHIVE
James M. Connolly
Big Data, Decisions & the Simulated Experience


2/3/2015  LISTEN   87
ARCHIVE
James M. Connolly
A Chat About Big Data, Machine Learning & Value


1/15/2015  LISTEN   125
ARCHIVE
Curtis Franklin Jr.
An Infrastructure for Analytics


12/18/2014  LISTEN   63
ARCHIVE
James M. Connolly
Prepare for the Internet of Things Data Blitz


12/16/2014  LISTEN   50
ARCHIVE
James M. Connolly
How Mature Is Your Analytics Program?


11/18/2014  LISTEN   148
Information Resources
Infographic
Infographic
It Pays to Keep Insurance Fraud in Check
While 97% of insurers say that insurance fraud has increased or remained the same in the past two years, most of those companies report benefits from anti-fraud technology in limiting the impact of fraud, including higher quality referrals, the ability to uncover organized fraud, and improve efficiency for investigators.
Follow us on Twitter
Follow us on Twitter
Quick Poll
Quick Poll
Like us on Facebook
Like us on Facebook
About Us  |  Contact Us  |  Help  |  Register  |  Twitter  |  Facebook  |  RSS