Daniel D. Gutierrez

Data Prep and Data Quality

NO RATINGS
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
Zimana
User Rank
Blogger
Re: Stop munging my data
Zimana   9/15/2016 1:06:41 PM
NO RATINGS
I've seen that form time to time, too, Terry. It should be in the data clensing catagory, but not sure if there is a specific difference in the term meaning.  One challenge in data science is that a lot of terms are used interchangeably and can lose meaning in some ways.

T Sweeney
User Rank
Blogger
Re: Stop munging my data
T Sweeney   9/13/2016 10:27:48 PM
NO RATINGS
Another term I've heard used is "data scrubbing"... because dirty, filthy data needs a long soak in a vat of lye and then a good going-over with some steel wool.

Is this related, separate, one and the same as munging, or something else?

Lyndon_Henry
User Rank
Blogger
Re: Stop munging my data
Lyndon_Henry   9/13/2016 4:01:12 PM
NO RATINGS
..

Daniel writes

 I always called the process "data transformation." I had never heard of either "munging" or "wrangling." But when I was acting as a Community TA for the data science certificate program on Coursera, the Johns Hopkins professors teaching the course used "munging," so I kind of adopted the term. I even have a chapter on "Data Munging" in my new machine learning book! Call it what you will, the process is the important part, not the term.



 

I agree the process is more important than what you call it. In any case, it's an issue involving data quality, including the quality of data input to analytical processes.

"Data munging" seems suspiciously similar to "data cooking", which is a term more commonly used in my own professional circles. The problem of data beeing "munged" or "cooked" or whatever was an issue in a big Austin-area politically tinged urban transit controversy I was involved in over 2 years ago. This is described in my  A2 blog article at the time: Analytics Fuel Transit Duel in Austin. Incidentally, my side ultimately was the victor ...

 

Zimana
User Rank
Blogger
Re: Stop munging my data
Zimana   9/13/2016 12:25:46 PM
NO RATINGS
Lyndon - I love that you provided the definition and some backgrouond to a phrase I really did not understand how it came into use.  Thank you!

Zimana
User Rank
Blogger
Re: Stop munging my data
Zimana   9/13/2016 12:24:22 PM
NO RATINGS
True. I have a problem wrapping my head around the term munging as well. Terms should describe the action, but the value of an action can change. Event tracking is a technical term from JavaScript development, but when web analytics became popular, its meaning seem less relevant from its purpose - to tag media on a web page.

Daniel Gutierrez
User Rank
Blogger
Re: Stop munging my data
Daniel Gutierrez   9/12/2016 10:05:54 PM
NO RATINGS
Yes, "data munging" is an odd bird. In the past, I always called the process "data transformation." I had never heard of either "munging" or "wrangling." But when I was acting as a Community TA for the data science certificate program on Coursera, the Johns Hopkins professors teaching the course used "munging," so I kind of adopted the term. I even have a chapter on "Data Munging" in my new machine learning book! Call it what you will, the process is the important part, not the term.

Lyndon_Henry
User Rank
Blogger
Stop munging my data
Lyndon_Henry   9/12/2016 5:48:19 PM
NO RATINGS
..

Daniel writes

After data acquisition, the early stages of the "data science process" include data preparation (also known as data transformation, data wrangling, and data munging) and data quality (also known as data cleansing). 



 

I'd never before encountered the term "data munging", so I Googled it. For those who may be similarly benighted, here's a definition of the word munge from WhatIs.com:

According to The New Hacker's Dictionary , munge (pronounced MUHNJ ) is (1) a verb, used in a derogatory sense, meaning to imperfectly transform information, or (2) a noun meaning a comprehensive rewrite of a routine, data structure, or the whole program.



 

Hopefully now, one of these days I'll have an opportunity to say something like "Just munge your own data, and leave mine alone ..."

 

Zimana
User Rank
Blogger
Re: Data Prep is the new SEO
Zimana   9/2/2016 7:59:21 AM
NO RATINGS
The one that I really like at the moment is an open source database called Neo4j.  It creates a graphic and treats each data element as a node.  It then draws the graphic to show how each node relates based on metadata on the element.  Very useful.

Daniel Gutierrez
User Rank
Blogger
Re: Data Prep is the new SEO
Daniel Gutierrez   8/31/2016 11:38:50 PM
NO RATINGS
Yes, I agree. Furthermore, I see a continued flow of new data prep, data integration and data quality tools arriving in the marketplace all the time. This seems to say that these tasks are being taken seriously.

Zimana
User Rank
Blogger
Re: Data Prep is the new SEO
Zimana   8/31/2016 3:57:38 PM
NO RATINGS
The analytic climate now has made companies curious, even if they have not quite the knack for identifying what they need from the data.  That ability will be refined as new data prep tools come into the market.

Page 1 / 2   >   >>
Information Resources
More Blogs from Daniel D. Gutierrez
Put your organization through this data readiness evaluation and see how you score.
Streaming analytics, which is drawing an increasing amount of interest, helps enterprises by visualizing the business in real-time, cutting preventable losses, automating immediate actions, and detecting urgent conditions.
Forget about the threat that machines will launch an uprising against the human race. We have one thing that those machines don't, data scientists.
The year 2016 will be a time for a lot of organizations to consider moving to a data lake strategy. Keep in mind how the concept has changed a bit and what concerns remain.
Radio Show
A2 Conversations
ARCHIVE
Jessica Davis
Analytics: Make the Most of Data's Potential in 2017


1/19/2017  LISTEN   19
ARCHIVE
Jessica Davis
A2 Radio: Can You Trust Your Data?


12/20/2016  LISTEN   70
ARCHIVE
James M. Connolly
Retail Analytics: See Where Style Meets Statistics


12/6/2016  LISTEN   53
ARCHIVE
James M. Connolly
Why the IoT Matters to Your Business


11/29/2016  LISTEN   45
ARCHIVE
James M. Connolly
Will Data and Humans Become Friends in 2017?


11/22/2016  LISTEN   40
ARCHIVE
James M. Connolly
We Can Build Smarter Cities


10/20/2016  LISTEN   31
ARCHIVE
James M. Connolly
Visualization: Let Your Data Speak


10/13/2016  LISTEN   70
ARCHIVE
James M. Connolly
How Colleges and Tech Are Grooming Analytics Talent


9/7/2016  LISTEN   56
ARCHIVE
James M. Connolly
How Machine Learning Takes Handwriting Recognition to New Levels


8/25/2016  LISTEN   40
ARCHIVE
AllAnalytics
A Look at Tomorrow's Data Scientist


8/9/2016  LISTEN   83
Information Resources
Quick Poll
Quick Poll
About Us  |  Contact Us  |  Help  |  Register  |  Twitter  |  Facebook  |  RSS