Recently, in a discussion with a company about becoming data-driven, I ran into the same challenge as many times before: the company claims to gather so much data, but the amount of value generated from that data is very small. It makes one wonder what underlies these patterns of, apparently, enormous amounts of data being collected but very little of that data being used to create something of value. In my experience, there are at least three factors at play: sense of ownership, local optimization and cost of ‘productizing’ data.

A typical pattern in many organizations is that teams generating and collecting data for their purposes feel strong ownership of that data and don’t want others to prod “their data” with their big, fat fingers. It’s theirs and if anyone else needs similar data, they can go and collect it themselves rather than get it for free from the team.

This leads to many small islands of data that are entirely disconnected and don’t aggregate into something more valuable than the sum of the parts. Teams may brag about all their data, but nobody else can use it.

Any team that decides that they need data to improve the quality of their decisions will focus on their own challenge and only collect what they need at the level of detail, frequency and aggregation that they need. In addition, they can decide on a moment’s notice to fundamentally change the way data is collected, as well as what data is collected.

The consequence is that the data typically is hard to use outside of the immediate context for which it was generated. This leads to different teams collecting very similar data, due to the lack of coordination. Also, as few think about the broader use, teams that realize that they need data are unable to reuse any of the existing data as it’s so specific to the use case for which it was collected.

If a team would decide to make their data available for others, they would need to provide documentation on the semantics of the data, set up a system for finding and downloading data sets, ensure that changes to the way data is collected, the semantics, and so on, are carefully communicated to stakeholders and, of course, respond to requests from these stakeholders and make changes to the data collection processes not to benefit themselves, but to help others in the organization. And, last but not least, the team may easily be held accountable for privacy, GDPR, security and other concerns that companies have around the stored data.

'Teams will actively try to not share data'

The consequence is that, unless a counterforce is present, teams will actively try to not share data because of the effort and cost of sharing with others in the organization. This again leads to lots of data recorded, stored and used for specific, narrow use cases, but no synergies, no end-to-end understanding of systems in the field and the way customers are using it, and so on.

The solution to these challenges is to adopt a hierarchical value modeling approach where you connect top-down business KPIs to lower-level metrics that can be collected directly from the field. By building this hierarchical, acyclic, directed graph and quantitatively establishing the relationship between higher and lower-level factors, we can finally start to generate business value from all the data we collect.

Getting from the current state to this hierarchical value model isn’t easy, if only because most people in the companies I work with find it extremely hard to determine what quantitative factors we’re optimizing for, and if we do know, the relative priority of these factors is a source of significant debate. However, it provides enormous benefits as you can focus data collection on the things that matter, use the data to make higher-quality decisions and build data-driven offerings to customers that you couldn’t have created otherwise. As the adage goes, it’s not about what you have, but about how you use it!