Is Data a Mirage?
Here are some thoughts in response to Gerardo Dada’s thoughtful December 25, 2014 LinkedIn post, “The Mirage of Data.” Each numbered statement is quoted directly from Dada’s original post, with our response just below that:
“1. Data only looks at the past”
MindEcology response: True, but that fact alone does not render data useless. Past trends can often be extrapolated to likely future events with high confidence levels, but admittedly they can never be relied on to predict the future with 100% accuracy. At the end of the day, predicting possible futures based on past data is much more likely to get you closer to the truth than completely ignoring the past.
“2. Data cannot answer ‘why’ and fails to capture emotion”
MindEcology response: True: we data geeks are quite used to repeating the mantra “correlation is not causation.” But again, this fact doesn’t keep us from leveraging data to make sound decisions. Data doesn’t need to tell us the “why” in order to be an effective predictor of possible future scenarios. Rather, it just needs to show (or not show) statistically significant correlations in order to point towards the best path of effective action. We can leave the “why” to philosophers.
“3. Data is always biased”
MindEcology response: No statement is always true, including this one. Example: if my server log counts 100 visits to my Home page on a given day, we can safely take that to mean 100 visits; not a lot of room for bias (provided that my server log is working properly). However, data gets more biased when we start drawing conclusions about what we “should” do with the data we have collected. For example, it is a well-known bias that we collectively tend to look at the data we want to see – the data that supports our per-conceived notions – rather than looking at the entirety of the data. But it’s important to remember that it’s not usually the data that is biased – it is we, the interpreters of that data – who need to do our best to check our biases at the door as best we can.
“4. We often look at the data we can collect, not the data we need”
MindEcology response: Agreed. A corollary to this proposition: we often look at the wrong data, or worse, we look at data just for data’s sake. A savvy analyst will home in on the small handful of data points that matter to her business decisions and purposefully filter out the noise.
“5. Data makes it easy to confuse correlation with causation”
MindEcology response: Agreed. See above. But again, that doesn’t always matter. We don’t have to know the flow of causation in order to draw conclusions based on correlations. For example: if the data show that my best locations have historically performed better when located within two city blocks of an office park, this fact doesn’t mean for certain that people from the office park are directly bringing me business. The business could very well be coming from somewhere else. That said, if the data shows such a trend across multiple samples with a strong level of statistical significance, I had darn better well not ignore the finding when making future site selection decisions.
“6. The delusion of a single explanation”
MindEcology response: Agreed. Reality is complex and multi-dimensional. Relying on a handful of data points to explain complex realities is tantamount to extreme prejudice or worse, total insanity. However, explanations are often not what are required to make business decisions. If it’s raining outside, I’m going to put on a raincoat in order to stay dry before going out. This would be my likely course of action, regardless of whether the fact that it is raining tells me anything meaningful about why the world is the way it is or how the world works (again, leave that part to the philosophers).
“7. More data is not better data”
MindEcology response: Completely agreed and well said. In fact, almost always, less is more.