Why your data is useless

Virtually all organizations I work with have terabytes or even petabytes of data stored in different databases and file systems. However, there’s a very interesting pattern I’ve started to recognize during recent months. On the one hand, the data that gets generated is almost always intended for human interpretation. Consequently, there are lots of alphanumeric data, comments and other unstructured data in these files and databases. On the other hand, the size of the stored data is so phenomenally large that it’s impossible for any human to make heads or tails of it.

The consequence is that enormous amounts of time are required to preprocess the data in order to make it usable for training machine learning models or for inference using already trained models. Data scientists at a number of companies have told me that they and their colleagues spend well over 90 percent of their time and energy on this.

'Most of the data is mud pretending to be oil'

For most organizations, therefore, the only way to generate any value from the vast amounts of data that are stored on their servers is to throw lots and lots of human resources at it. Since, oftentimes, the business case for doing so is unclear or insufficient, the only logical conclusion is that the vast majority of data that’s stored at companies is simply useless. It’s dead weight and will never generate any relevant business value. Although the saying is that “data is the new oil”, the reality is that most of it is mud pretending to be oil.

Even if the data is relevant, there are several challenges associated with using it in analytics or machine learning. The first is timeliness: if you have a data set of, say, customer behavior that’s 24, 12 or even only 6 months old, it’s highly likely that your customer base has evolved and that preferences and behaviors have changed, invalidating your data set.

Second, particularly in companies that release new software frequently, such as when using DevOps, the problem is that with every software version, the way data is generated may have changed. Especially when the data is generated for human consumption, eg engineers debugging systems in operation, it’s time consuming to merge data sets that were produced by different versions of the software.

Third, in many organizations, multiple data sets are generated continuously, even by the same system. To derive the information that’s actually relevant for the company frequently requires combining data from different sets. The challenge is that different data sets may not use the same way of timestamping entries, may store data at very different levels of abstraction and frequency and may evolve in very unpredictable ways. This makes combining the data effort consuming and any automation developed for the purpose very brittle and likely to fail unpredictably.

My main message is that, rather than focusing on preprocessing data, we need to spend much more time and focus on how the data is produced in the first place. The goal should be to generate data such that it doesn’t require any preprocessing at all. This opens up a host of use cases and opportunities that I’ll discuss in future articles.

Concluding, for all the focus on data, the fact of the matter is that in most companies, most data is useless or requires prohibitive amounts of human effort to unlock the value that it contains. Instead, we should focus on how we generate data in the first place. The goal should be to do that in such a way that the data can be used for analytics and machine learning without any preprocessing. So, clean up the mess, get rid of the useless data and generate data in ways that actually make sense.

 

The game plan for 2020

In reinforcement learning (a field within AI), algorithms need to learn about an unexplored space. These algorithms need to balance exploration (learning about new options and possibilities) with exploitation (using the acquired knowledge to generate a good outcome). The general rule of thumb is that the less is known about the problem domain, the more the algorithm should focus on exploration. Similarly, the better the problem domain is understood, the more the algorithm should focus on exploitation.

The exploration/exploitation balance applies to companies too. Most companies have, for a long time, been operating in a business ecosystem that was stable and well understood. There were competitors, of course, but everyone basically behaved the same way, got access to new technologies at about the same time, responded to customers the same way, and so on. In such a context, a company naturally focuses more and more on exploitation as the reward for exploration is low. This is exactly what I see in many of the organizations I work with: for all the talk about innovation and business development, the result is almost always sustaining innovations that make the existing product or solution portfolio a bit better.

With digitalization and its constituent technologies – software, data and AI – taking a stronger and stronger hold of industry after industry, the stable business ecosystem is being disrupted in novel and unpredictable ways. Many companies find out the hard way that their customers never cared about their product. Instead, the customer has a need and your product happened to be the best way to meet that need. When a new entrant provides a new solution that meets the need better, your product is replaced with this new solution.

'Companies need to significantly increase the amount of exploration'

The only way to address this challenge is to significantly increase the amount of exploration your company conducts – we’re talking real exploration, where the outcome of efforts is unknown and where everyone understands that the majority of initiatives will fail. To achieve this, though, you need a game plan. This game plan needs to contain, at least, four elements: strategic resource allocation, reduced effort in commodity functionality, exploration of the novel business ecosystems and/or new positions in the existing business ecosystem and exploration of disruptive innovation efforts that are enabled through data and AI.

Many companies allocate the vast majority of their resources to their largest businesses. This makes intuitive sense, but fails to put a longitudinal perspective on the challenge of resource allocation. A model that can be very helpful in this context is the three horizons model. This model structures the businesses the company is in into three buckets. Horizon one are the large, established businesses that, today, pay the bills. Horizon two are the new, rapidly growing businesses that, however, are much smaller than the horizon one businesses. These are intended to be our future horizon one businesses. Horizon three are all the new, unproven innovation initiatives and businesses where it’s uncertain that things will work out but that are the breeding ground for future horizon two businesses. Resource allocation should restrict horizon one resources to maximally 70 percent of the total. Horizon two should get up to 20 percent and at least 10 percent of the total company resources should be allocated to horizon three.

Within horizon one, each business should grow its resource usage slower than revenue growth. That might mean that a horizon one business growing at 5 percent per year should cut its resource usage with 5 percent per year as this business is supposed to act as a cash cow for funding the development of future horizon one businesses.

In most companies, revenue and resource allocation are closely aligned with each other, but this is a mistake from a longitudinal perspective. A new business will require years of investment before it can achieve horizon one status and this new business can’t fund itself. Of course, you can have it bootstrap itself, but the result will typically be that competitors with a more strategic resource allocation will become the market leaders in these new businesses.

'Once you’ve defined the commodity, **stop** virtually all investment in it'

Second, reduce investment in commodity functionality. Our research shows that companies spend 80-90 percent of their resources on functionality and capabilities that customers consider to be commodity. I’ve discussed this in earlier blog posts and columns, but I keep getting surprised at the lack of willingness of companies to look into novel ways of reducing investment in places where it doesn’t pay off. Don’t be stupid and, instead, do a strategic review of your entire product portfolio and the functionality in your products and, together with customers and others, define what’s commodity and what’s differentiating. Once you’ve defined the commodity, **stop** virtually all investment in it. You need those resources for sustaining innovations that drive differentiation for your products.

Third, many companies consider their existing business ecosystem as the one and only way to serve customers. In practice, however, ecosystems get disrupted and it’s far better to be the disruptor than the disruptee. This requires a constant exploration of opportunities to reposition yourself in your existing ecosystem, as well as an exploration of novel ecosystems where your capabilities might also be relevant.

Finally, digital technologies – especially data and AI – offer new ways of meeting customer needs that you must explore in order to avoid being disrupted by, especially, new entrants. Accept that the value in almost every industry is shifting from atoms to bits, that data can be used to subsidize product sales in multi-sided markets, that AI allows for automation of tasks that were impossible to automate even some years ago and, in general, proactively explore the value that digital technologies can provide for you and your customers. This is where the majority of the resources that you freed up through horizon planning and reducing investment in commodity functionality should go.

Concluding, at the beginning of 2020, you need a game plan to significantly increase exploration at the expense of exploitation in order to identify new opportunities and detect disruption risks and to invest sufficiently in areas that provide an opportunity for growth. This requires strategic resource allocation, identifying and removing commodity, a careful review of your position in existing and new business ecosystems and major exploration initiatives in the data and AI space. It’s risky, it’s scary, most initiatives won’t pan out and customers, your shareholders and your own people will scream bloody murder. And yet, the biggest risk is to do nothing at all as that will surely lead to your company’s demise. Will you allow that to happen on your watch?

Why care about purpose in business?

Peter Drucker famously said that the purpose of a business is to create a customer and a customer is defined as someone who pays for the products and services the company offers. This perspective seems to be shared by many in business: as long as revenue and profits are generated, there’s no reason to bother about anything else. It’s all about the money!

Whenever there’s a discussion about morals and ethics, lip service is paid to those questions, but only if there’s a monetary reason for it. For instance, if trading with certain types of industries would be frowned upon by other customers and thus might lead to reduced sales. In this case, the revenue loss with existing customers outweighs the additional revenue and, as a result, the company may decide to not serve those industries. Although the outcome may be the desirable one, the rationale for the decision is pecuniary only.

At the same time, there are many companies out there that are purpose driven and explicitly seek to make the world a better place and improve the state of humanity. In the US, Whole Foods and Patagonia are good examples of this. To paraphrase the former co-CEO of Whole Foods, John Mackey: companies need to make money in the same way as our bodies need to make red blood cells if we want to live. But the purpose of our bodies is not to make red blood cells. Similarly, companies need to go beyond the sole focus of making money.

'Interestingly, focusing on purpose proves to be good for making money'

Interestingly, counter to what one might expect, focusing on purpose proves to be good for making money. Research shows that purpose-driven companies have higher profit margins than their competitors. In “Corporate culture and performance”, John Kotter and James Heskett show that over a decade-long period, purpose-driven companies outperform their counterparts in stock price by a factor of twelve.

The typical reasons why a purpose-driven company might do better have to do with more engaged employees and more passionate customers. With Gallup showing that the percentage of employees engaged in their work is in the low teens across the world, it’s clear that significantly increasing that percentage will do miracles for a company’s productivity and output. Similarly, we know that word of mouth is one of the most powerful and cost-effective ways to reach new customers.

So, why are so few companies explicit in expressing their purpose? One of the key challenges, I think, is that there’s an instinctive fear that expressing a purpose will be viewed as negative by at least some groups in society, resulting in alienating some parts of the customer base. As Simon Sinek so eloquently expressed this: “People don’t buy what you do; they buy why you do it!” The flip side of this statement is that the people that disagree with your why won’t buy from you.

Another reason, I believe, is that expressing a purpose may easily alienate employees. Putting such a stake in the ground may cause some of them to shy away from your business, while they could add value from a technical perspective. The corollary is, of course, that working with people that aren’t aligned with your implicit mission is demotivating as you and others may easily end up pulling in different directions.

The primary reason, however, is that, in my experience, many leaders don’t have clarity on their own purpose nor on the purpose of the company they lead. And when you yourself are unclear on your professional purpose, it’s difficult to express it clearly to others. The key challenge often isn’t whether an aspect of one’s purpose is positive or not, but rather it’s the relative priority of different aspects. When having to choose between revenue and environmental impact, how much cost savings justify what level of impact? Would your company go out with an ad like Patagonia where they showed a jacket with the text “Don’t buy this jacket”? Or, like Tesla, make your patent portfolio publicly available as long as your competitors use it to positively affect climate change?

Doctors have the goal of healing patients. Firefighters aim to protect people and property from damage. Teachers seek to educate the next generation. Business can’t just be about making money. We have the obligation to hold ourselves to a higher standard. What’s the purpose of your company? And how does your mission align with it? And what hard decisions do you take to live up to that purpose and mission?

With Christmas and New Year upon us, I encourage all of us to reflect on why we do what we do. What are we doing to contribute to a world that gets better all the time? Because the world **is** getting better and technology is at the heart of that. But it doesn’t happen automatically. It requires us, as technologists, to explicitly focus on the purpose and meaning of what we do.

More process doesn’t help

Over the last weeks, I’ve been to three different conferences where I heard presentations that were variations on a common theme: if we would just add more structure and more process to the topic at hand, if we would only introduce more steps, more checkpoints, involve more people, and so on, then all the problems we’re experiencing with this product roadmapping, these innovation initiatives, these business development activities, would magically disappear.

Although most would agree that this is obviously wrong, the fact is that in many companies, universities and government institutions, this is exactly what happens. The organization experiences some kind of problem, perhaps even one that may be exposed in the media and makes management look bad, resulting in a top-down order to “fix it”. The subsequent process is obvious for those that have been part of it. First, there’s an activity to describe the process that led up to the issue surfacing. This is followed by a review of all the actions and other factors, with the intent of identifying what went wrong. Finally, a new process is introduced or an existing process is updated to address the perceived limitations, holes or weaknesses in the current way of working.

Once introduced, the next step is to ensure enforcement of the new way of working. Obviously, the new or updated process adds overhead and makes it more difficult to perform the tasks efficiently. So, before you jump the gun and start to work on further complicating the existing processes in the organization, there are five factors I’d like you to consider.

First, one of the concerns that many ignore, but that’s obvious when you think about it, is that the future is fundamentally unknowable. Looking back, we have full knowledge of what has happened. Consequently, it’s obvious what the optimal way to address an issue would have been. However, when standing at the point when a decision needs to be taken, we’re doing so with significant uncertainty about the implications.

'Incompetence cannot be cured by more process'

Second, depending on the organizational culture, it may be very difficult to point out that individuals have acted out of a fundamental lack of competence. It’s important to realize that incompetence cannot be cured by more process. Incompetence requires educating people or, if that proves unfeasible, replacing individuals with new people.

Third, the more process is introduced and the more enforcement of process takes place, the more people focus their attention on correctly following the process, rather than focusing on accomplishing the desired outcome. This leads to a fundamental lack of accountability in the organization, with everyone hiding behind having followed the process and failing to take responsibility for the desired results.

Fourth, too much process can cause more problems than it solves. As processes are created to be repeatable and to apply to a large variety of different situations, an overly detailed process definition is, by definition, ineffective in the majority of situations. Especially in organizations that place high value on following due process, the inefficiencies and harm done by blindly following process can become staggering, potentially even to the point of companies being disrupted.

Finally, in most organizations that I work with, processes and methods are developed by people that are outside the arena, meaning that they won’t be affected by the implications of the process and method definitions. Although not actually performing the job, there’s a strong tendency to act as “Monday morning quarterbacks”, a reference to the Monday watercooler meetings in especially US companies where the flaws of a team’s quarterback are discussed. The interesting thing is that the criticism tends to come from people that would never ever qualify as quarterbacks themselves.

Concluding, before you fall into the ‘more process’ trap, please ask yourself whether it would help to predict the future better, whether your people perhaps lack competence, whether you promote accountability, whether the root cause is perhaps too much process and whether you’re listening to so-called experts that don’t actually have a sufficient understanding of the situation.