Skip to main content
Established 2009

Data Cleanup and Discovery

Discover the Latent Power in Your Data

Customer and order history data is powerful. Unfortunately, for the vast majority of companies, it is also messy. And hard to pin down. Given the tremendous value of data to organizations, having usable data is essential.

But don’t let the poor state of your data hold you back from mining its riches. Cleaning up your data is a high-ROI proposition. The investment is modest compared to the wealth that you will be poised to unlock from your data.

Clean Up Messy Data

Almost every database contains at least some “messy” data. A messy data set is one whereby some percentage of the records include missing, inaccurate, or misplaced data. Organizations differ on their tolerance for messy data. Obviously, the closer to 0% messy data, the better.

Data cleanup (a.k.a., data cleansing or data scrubbing) is a multi-step process that results in a more complete, useful and accurate data set. The goals of the data cleanup process include improved:

  • Validity
  • Completeness
  • Consistency
  • Uniformity


Integrate and Prepare Your Data

Most organizations store data in many different places and formats, and ownership is often spread across multiple departments, administrators and owners. Often, valuable data may be stored locally on laptops computers, for example, and not even be recognized as valid data at the organizational level. But it, too, can have tremendous value.

Once your various and individual data sets are cleaned up, it can be useful to standardize and integrate your data. To accomplish this, one or more of the following steps may be called for:

  • Data audit
  • Workflow specification
  • Workflow execution
  • Post-processing


Further steps can be taken to prepare individual data sets for analysis, including:

  • Parsing (i.e., breaking it out into smaller, meaningful components)
  • Data transformation (i.e., binning ranges of data, changing numerical variables to categorical variables, etc.)
  • Duplicate elimination
  • Interpolation (i.e., assigning missing values using statistical methods)
  • Removing outliers
  • Data munging (i.e., mapping data from one raw form to another)


Data Discovery

The discovery process involves building a statistical model of your data. It can incorporate elements of the data audit (mentioned above) but can also encompass a deeper statistical analysis. MindEcology’s data discovery process considers our client’s marketing objectives in the context of available data in order to build a list of possible data-usage scenarios. These scenarios can also take into account externally-available data sets that can potentially be integrated with existing data.

Contact MindEcology today and take charge of one of your most valuable assets – your data.


data audit, data cleanup, data discovery, data munging