California Management Review
California Management Review is a premier academic management journal published at UC Berkeley
by Michael Matthews and Thomas Kelemen
Image Credit | fabio
Headlines often refer to a “data-driven leaders,” “data-driven organizations,” or “data-driven customers.” By prioritizing data, decision-makers avoid irrationality and ensure that their assumptions are sound. Yet, we must remember that data can also be misleading. Consider the following finding that is statistically valid but completely nonsensical: from the year 1999 to 2009, the number of people who drowned by falling into a pool correlates with the number of films that featured the actor Nicolas Cage.1
“Data-Driven, Data-Informed, Data-Augmented: How Ubisoft’s Ghost Recon Wildlands Live Unit Uses Data for Continuous Product Innovation” by Karl Werder, Stefan Seidel, Jan Recker, Nicholas Berente, John Gibbs, Nouredine Abboud, & Yossef Benzeghadi. (Vol. 62/3) 2020.
“Incentivizing Environmental Improvements in Supply Chains through Data-Driven Governance” by Dara O’Rourke & Niklas Lollo. (Vol. 64/1) 2021.
““Digital Colonization” of Highly Regulated Industries: An Analysis of Big Tech Platforms’ Entry into Health Care and Education” by Hakan Ozalp, Pinar Ozcan, Dize Dinckol, Markos Zachariadis, & Annabelle Gawer. (Vol. 64/4) 2022.
Recognizing the ludicrousness of this relationship is straightforward (and entertaining). On a day-to-day basis, the fallacy is not so easily spotted. Statistical analysis involves subjective decisions, comprise, and a relentless dedication to data management—all of which can cause problems if we solely rely upon numbers. The lesson here is that data alone is not enough. We need frameworks that predict why a relationship exists. In science, we call these mental models “theory.” Many of us have heard of Einstein’s Theory of Special Relativity or Darwin’s Theory of Evolution. Similarly, Social Learning Theory predicts that employees learn acceptable behavior by watching their boss, and Expectancy Theory explains the mechanics of employee motivation. These theories provide a window into how organizational life operates.
Below, we capture how theory can supplement the shortcomings of data. Of course, the main takeaway is not to avoid data, but rather, to recognize that data is only one ingredient. Stated differently, managers must be “data-driven” and “theory-driven” to succeed in today’s economy.
Twenty-nine research teams were given the exact same data and asked to find a potential connection between the players’ race and red cards in soccer matches.2 Twenty of the teams found a meaningful relationship. Yet, nine teams found no significant effect. This divergence among competent and well-trained scientists frequently occurs. For example, in a different context, 70 teams analyzed a single neuroimaging dataset, but, surprisingly, no two teams chose the same workflow for analyzing the data.3 Clearly, statistics is full of tradeoffs—it is an art and a science. The danger is that data analysts will consciously (or even subconsciously) make tradeoffs to please upper management at the expense of accuracy.
Theory can help us solve this conundrum. Theory provides an abstract understanding of how the world should operate. As we commit to a particular framework, we create guardrails for our data analysis. In the workplace, by writing down predictions (e.g., our new wellness initiative will improve employee satisfaction), managers will be less distracted by “shiny objects” that may be statistically valid, but ultimately distracting (e.g., our new wellness initiative increases inventory throughput).
Rarely does data perfectly map onto what we are trying to measure. For example, a manager may have data on employee satisfaction, but forget that only 25% of employees completed the form or that the questions were poorly worded. Customer data, as another example, can be especially deceiving because the happiest and angriest individuals usually provide feedback, with those in the middle usually not responding to surveys. Even if we do have a good sample, there are myriad biases in data collection, such as social desirability or extreme response bias. We can never forget a data set’s shortcomings when interpreting a statistical model.
Again, theory can help us. Once we have our data, we can perform “sanity checks” to ensure that it behaves the way it should in a normal context. For example, a correlation table can quickly reveal whether our variable of interest (e.g., organizational commitment) fluctuates with what theory predicts (e.g., advancement opportunities). If both theory and data predict a relationship between two variables, we can be more assured of our conclusions even if the data is less-than-ideal.
Data can be messy — very messy. Unfortunately, bad data is everywhere. An estimate suggested that 88% of all spreadsheets have errors.4 For example, errors in Excel partly contributed to the “London Whale Debacle,” which caused JP Morgan a $6 billion trading loss.5 Many other companies have lived similar nightmares. For many managers, proper data science management can feel impossible.
Without a doubt, managers should be data-driven. But the reality is that data is sometimes messy, missing, or unreliable. In these situations, leaders can turn to the human mind. Thought experiments are mental exercises where people consider how a situation would unfold based on existing or emerging theories. Indeed, this approach was famously used by intellectual giants such as Newton, Galileo, Einstein, Locke, and others. Thought experiments encourage us to consider the facts, reflect on theory, and derive sensible conclusions. If managers know and understand theory, problems can be solved even if data is temporarily unavailable.
An article published by Harvard Professor David J. Deming found that math-intensive but less social jobs are shrinking in the labor market.6 This finding should give all managers pause. Perhaps being data-driven is insufficient. Maybe managers need skillsets beyond number crunching. Reinvent yourself to become both a theory-driven and data-driven manager.
Vigen, T. (n.d.). tylervigen.com. Retrieved 03 23, 2023, from Spurious Correlations: https://www.tylervigen.com/spurious-correlations
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356. https://doi.org/10.1177/2515245917747646
Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., … & Rieck, J. R. (2020). Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582(7810), 84-88. https://doi.org/10.1038/s41586-020-2314-9
Bishop, K. (2013, 07 30). Spreadsheet blunders costing business billions. Retrieved from CNBC.com: https://www.cnbc.com/id/100923538
Lopez, L. (2013, 02 12). Business Insider. Retrieved from How the London Whale Debacle Is Partly The Result of An Error Using Excel: https://www.businessinsider.com/excel-partly-to-blame-for-trading-loss-2013-2
Deming, D. J. (2017). The growing importance of social skills in the labor market. The Quarterly Journal of Economics, 132(4), 1593-1640. https://doi.org/10.1093/qje/qjx022