Exploiting your data well

Based on our research, we’ve developed a four-dimensional model for the digital transformation in the software-intensive embedded systems industry. In the last two posts, we explored the business model and product upgrade dimensions. This post is concerned with the data exploitation dimension.

As shown in the figure, the first step in most companies is focused on the use of data for quality assurance and diagnostics. In this case, the data often arrives at the company in batches and through customer complaints. In response to a complaint, company representatives download the data and investigate whether the system behaved incorrectly and seek to identify the root cause. Many companies have been using this type of data for decades.

Evolving use of data

The second step is the use of data to monitor product performance and feature usage. This form of data collection is typically introduced in combination with the more frequent deployment of software. Monitoring product performance allows the company to confirm that the product is performing well after a software upgrade. Measuring feature usage allows for more informed prioritization of R&D resources by ensuring that research and development predominantly focus on improving features that are actually used by customers.

As we now have a continuous stream of data from each customer, we can move to the third stage where we can start to collect data relevant for the customer. Each customer has KPIs for which the organization optimizes, including churn in subscription service companies, measuring service usage by end-customers or classifying end-customers into segments that require different treatments. As the company now collects significant amounts of data from each customer, it can process and analyze that data and offer relevant insights to each customer.

'The next step is to provide relevant comparative data to each customer'

Finally, in the last stage, the company engages a second business ecosystem where it monetizes data collected from its primary customer base with a secondary customer base. For instance, a truck company could sell route information of the trucks in use at its customers to gas station and road service companies. Or, a telecom company could sell aggregate movement patterns of mobile phone users to city planners.

Once the company reaches the final stage, the data that’s collected in its products deployed at its primary customer base now is increasingly concerned with meeting the needs of the secondary customer base. So, the company is likely to start collecting data that has no relevance at all for its primary customer base but that is relevant for the secondary one. A second aspect is that the company can use the revenue from the secondary customer base to subsidize the products to its primary customer base and, through that, grow its market share. Many industries, as digitalization takes hold, move towards a “winner takes all” situation and this pattern of positive feedback cycles lies at the root of it.

Concluding, digitalization has implications for the business model, the way we upgrade products and the way that we collect, use and monetize data. Companies evolve in predictable and repeatable patterns through this transformation and in this post, I described the five typical stages we encounter in our research. Data is the new oil, but if you’re not able to generate business value from your data, it doesn’t do you much good. So, get going on experimenting with different ways to create value from your data!

Better all the time

In last week’s post, I mentioned our framework describing the transformation that companies go through when going digital. I also discussed one of its four dimensions – the business model dimension. In this post, the focus is on the product upgrade dimension.

As shown in the figure, we’ve identified five steps or phases in the transformation from a traditional to a digital company. In the first stage, the company focuses on selling a physical product. It’s sold ‘as is’ and except for warranty issues, the company spends no time or resources on it once it has left the factory. The product may well include electronics and software, but these subsystems are treated in the same way as the mechanical parts.

Evolution of product upgrades as part of the digital transformation.

As a second step, many companies set out to offer their product as a service to certain customer segments. This often starts as a mechanism to expand the clientele. Especially potential customers that don’t need the product all the time or that have issues financing the capex may need a service offering in order to become customers. In this step, the company often starts to offer periodic upgrades to the product software – predominantly to protect itself from unwanted downsides. In service contracts, there typically are service level agreements (SLAs) and software upgrades can be used to decrease the risk of violating these SLAs and avoid the associated penalties.

In the third step, the company has, from a business perspective, started to offer complementary services around the product. Frequently, the quality and appeal of these services can be improved if the core product has updated software functionality. In this case, the company upgrades the software in its products not only to protect from any downside issues, but also to create an upside in terms of additional revenue from complementary services. As a simple example, an automotive company may upgrade the software to provide an API for querying the location of a vehicle that can be used by complementary services to offer more relevant information for the context in which the vehicle and the driver may find themselves.

'The easiest way to positively drive KPIs is to deploy new software versions'

Finally, the concept of a physical product is completely replaced with its digital alter ego. In this case, all parts of the product can be upgraded on a periodic basis, with software being the most frequent and mechanics the least frequent. Even replacing the complete physical product is done as part of the continuous improvement of the digital product. As an example, although Apple most certainly makes the money on the physical product, from a user experience perspective, there’s a constantly improving experience that has small upward bumps when replacing the phone with a new model, but by and large, the improvement of the product is a continuous one.

Concluding, the digital transformation is a complex, multi-dimensional challenge that affects all parts of the company, including the way products get upgraded. Although this may seem like a technical challenge, it’s the business strategy that (should) drive the architecture and technology decisions that either allow for or prohibit the product upgrades discussed here. With business models increasingly moving from transactional to continuous, the product that’s being monetized by the business model needs to become continuous in terms of it constantly improving the user experience and value delivery to customers. One can’t exist without the other!

Digital for real: business model

Over the last months (actually, more like years), we’ve studied the digital transformation of several companies in the Software Center. Professor Helena Holmström Olsson and I developed a model to illustrate how they actually transition from their legacy business rooted in atoms to a digital business based on bits (see the figure). It has four dimensions: business model, data exploitation, product upgrade and AI/ML/DL. In this post, we’re focusing on the business model dimension.

Based on our research with several of the Software Center partners, we identified that companies evolve through a similar pattern when it comes to transforming their business model. Especially in the embedded systems domain, the starting point is traditional product sales, eg a car, a truck, a radar, a pump or a base station. We often refer to this as “box sales” and the business model is highly transactional: I sell you the box and I will then try to sell you a new box in a few years’ time. There may be some revenue generated from services, such as product maintenance, but this tends to be a small fraction.

The digital transformation stages.

The next phase is where the product is offered as a service. Here, mostly the monetization of the physical product changes from a one-time transaction to a continuous revenue stream. There are several challenges with this, including this requiring the company to, in effect, finance the product for its customers, but it can be a very effective way to grow revenue as customers that wouldn’t have bought the product in the traditional model as, for example, they don’t need it full-time, may well want to buy it as a service.

As a next step, we see that companies start to develop all kinds of services around the product. These services tend to surround the operation of the product and may range from offering accessories in a rental model to providing information and advisory services to improve efficiency or the quality of outcomes. In this phase, the product is used as a platform to generate more revenue from complementing services.

In the fourth step, the monetization model changes again and becomes more customer oriented. Rather than associating monetization with the product, it becomes associated with customer KPIs that can be influenced by the product. Examples of these customer KPIs include the number of successful deliveries without delays, the reduction in end-customer churn or the reaction time gained by earlier detection. In this case, the company focuses on the factors that influence the customer’s bottom line and links the business model to improving those factors.

Finally, the company seeks to develop a second customer base where it can monetize the data generated and captured from its primary customer base. For example, trucks have accelerometers that provide information about the quality of roads (such as the presence of potholes) and government functions responsible for road maintenance may be willing to buy this information. The result is a two-sided market where the company still sells to its primary customer base but also monetizes the data from its primary customer base to its secondary customer base. Over time, of course, the secondary customer base may become the more important one, which then fundamentally changes the incentives within the company.

Concluding, as part of your digital transformation, the business model that you employ will have to change and evolve as well. As we’ve shown, this evolution follows a pattern and jumping over intermediate stages tends to lead to failure. Although our model focuses on the embedded systems domain, I believe that all industries evolve through similar or identical patterns. As you experiment with evolving your business model, the stages presented here may provide guidance on where to focus next. As always, we’re eager to learn more about your experiences, so please reach out to us to share them.

It’s not about data; it’s about actionable insights

This week, I had an interesting discussion about data with the CEO of one of the startups I work with. The challenge is that many companies are collecting vast amounts of data, storing it and then leaving it as an unused asset. It surprises me that so many companies are maintaining such amazingly large data stores without finding good ways of using them.

The key underlying reason, in my opinion, is that collecting data is easy, but generating actionable insights out of it is hard. It requires a deep understanding of the domain, as well as insight into what provides value within the domain’s context. This calls for a mindset different from the one present at most companies, where the focus often is on doing what we’ve always done, but a little bit better or faster.

The company of the CEO I talked to operates in the media domain and has reached a point where key employees of its customers receive a daily email listing the highest-priority tasks that they should focus their energy on that day. These tasks are identified by collecting data from the relevant media properties of the customer, analyzing this data to identify deviations and anomalies and then recommending the most likely mitigation actions that will address the identified concerns. These mitigation actions are the tasks that the key employees receive in their daily to-do list. To me, this is the hallmark of being a data-driven company: it’s not about the data, but about generating actionable insights from the data that you can use to your advantage.

Of course, in this case, the insights are generated based on the customer’s own data. The next step is to get some or all of the customers to agree to anonymously share their data with the company. This allows the company to compare each customer to all the others, which allows for the next level of insights to be generated. Now, it’s not just deviations and anomalies identified within your own scope, but also those identified through comparison with others. By comparing your performance with others in the same industry, it’s much easier to gain insight into where energy should be invested to improve.

'We’re talking about continuous, quantitative and automatically generated insights'

Concluding, to be a data-driven company is not about having lots of data. It’s about generating valuable and actionable insights from that data continuously and using those to generate, preferably automatically, the actions for your team that will have the most impact. It’s not about the data; it’s about what you do with it.

Digital business: automated at heart

Digitalization is fundamentally enabled by three core technologies: software, data and artificial intelligence. The common denominator, which is inherent in a digitalized business, is that automation is at the heart of it. Digital technologies allow for automation to a much more significant extent than traditional technologies. We see this reflected in companies: whereas in traditional companies, humans are supported with automation, in digital businesses, automation of the core business processes has removed humans from the equation (almost) entirely.

One of the key reasons for the high degree of automation is that digital businesses typically employ continuous, rather than transactional, business models. This means that there’s a continuous relationship with the business of the customers, continuous delivery of new value-adding software, data-driven insights and AI models and continuous monitoring and logging. Activities that we might accept doing manually once or twice per year rapidly become subjects for full automation if they need to be conducted monthly, weekly, daily or even more frequently.

In a digital business, all core business processes are to the highest extent automated and controlled in an automated fashion using quantitative performance data. In fact, we can conceptualize a digital business as consisting of three circles of activities. The core circle consists of the company’s core value delivery business processes. For instance, for an e-commerce website, this includes the presentation of items, recommendations, managing orders and taking payments. These activities have no human involvement and are completely automated. In the case that core value delivery processes can’t be automated fully, such as warehouse tasks, the humans tend to be instrumented with data collection and subject to the same quantitative performance management as the automated parts. The first circle is concerned with operations and activities that support operations.

The second circle of activity involves human actors who use quantitative data for analytics and experimentation. The main focus here is to measure the core business processes and to tune and optimize them. For instance, analytics may show that items that are recommended to customers by the recommendation engine are selected and bought in 0.15 percent of the cases. As the industry average is higher than that, one of the activities in this circle might then be to experiment with different recommendation algorithms using A/B testing to evaluate whether the engine’s success rate can be improved to match the average. The second circle is concerned with tactics that improve the performance of the operational core. It’s important to note that activities in this circle don’t have to be performed by humans. It’s entirely feasible to have a system run autonomous improvement activities that focus on optimizing the core business processes.

Finally, the third circle is concerned with those business activities that are strategic in nature. As strategic activities tend to be about interpreting trends and predicting the future, it can be challenging to quantify them. Typical for activities in this circle is that the focus is on the purpose of the business, the role it plays in its ecosystem and the way it seeks to differentiate and complement itself towards others.

The three circles of activities are different from each other not just in terms of automation and use of data, but also in the cycle time and operating speed. The operations circle runs, by its very nature, in seconds, minutes and hours. The tactical circle operates in days and weeks, whereas the strategic circle tends to operate in months and years.

Of course, one can find huge amounts of automation in traditional companies as well. The main difference with digital businesses is the underlying mindset and approach. A bit exaggerated, in traditional companies, tasks are performed by humans unless it’s too expensive to do so. In digital companies, tasks are automated and performed by systems unless it’s unfeasible or prohibitively expensive to do so.

It’s easy to forget how far automation and digitalization can take a company. In many SaaS companies, the vast amount of business value creation (as in 99+ percent) is conducted fully automatically by systems rather than humans. The funny thing is, however, that in my experience, even in SaaS companies, the majority of management attention is directed towards humans and human processes, even if these represent a very small slice of the business.

Concluding, I find it helpful to think about companies in three distinct circles of activity, ie delivery and operations, optimization and experimentation and, finally, strategy and innovation, that have completely different characteristics, cycle times and success metrics. In my experience, many tend to mix up the activities in the different circles, which leads to confusion and sub-par performance. As a leader, take a step back and reflect on your organization, map the processes and activities to the three circles and identify where there are mismatches that you can address. Going digital is challenging, but the alternative is to remain a traditional company and risk being disrupted.

Why you’re not deploying AI

Imagine the following scenario. A (sizable) team at a large company writes customer documents in response to customer requests. They request help from the automation team to reduce their repetitive tasks. The automation team brings in an AI company, which develops an ML model that generates the customer documents automatically and virtually eliminates the need for human involvement. The prototype works amazingly well and both the AI company and the automation team are eager to move it into production as the cost savings, as well as the speed and quality of response to customers, are bound to improve significantly.

Sounds like a success story, right? Well, in this case, as well as in other cases that I’ve seen, the company managed to grab defeat from the jaws of victory. The solution wasn’t deployed. It’s probably not the end of the story and hopefully, the solution will be rolled out in the future, but the company experiences a significant delay in reaping the benefits from what should have been a straightforward and obvious deployment.

The pattern as I’ve seen it is that if AI is used to improve some product capability and it doesn’t affect existing organizational units nor existing processes, the deployment of the ML/DL model is quick and fairly seamless. The moment, however, existing organizations or teams are threatened in their existence or asked to reduce significantly in size or when existing work processes need to be adjusted to achieve the benefits, things rapidly grind to a halt and many in the organization start to backpedal.

 

Evolution stages of adopting ML/DL

In an earlier post, I presented the stages that companies go through when adding ML/DL to products. As shown in the figure, the first stage is experimenting & prototyping. Every company I work with has a host of those initiatives ongoing. However, when looking to transition successful prototypes and proofs of concept to actual deployment, we run into roadblocks.

The first and obvious roadblock is that you now need AI engineering to ensure that you have industry-strength, production-quality deployment of AI and, as I discussed in an earlier post, that requires a set of solutions, architecture, infrastructure and processes that are often not recognized by data scientists and people without an engineering background.

The second and more important roadblock is that the potential of AI is to significantly reduce cost while improving speed and quality. The fact is that for most companies, the primary cost driver is salaries. So, to reap the benefits of AI, it means reallocating or releasing the people that currently are doing the job that will be replaced by ML/DL models.

'It’s almost painful to write it down and not feel like an idiot'

This is so obvious that it’s almost painful to write it down and not feel like an idiot, but I keep running into situations like the scenario that we started with. Everyone loves AI and it’s on the top of the hype cycle. Everyone talks about all the great opportunities and benefits that AI will bring to their organization and society at large. But when it hits close to home, the willingness to change and reap the benefits suddenly is severely lacking.

This is a problem as the competition isn’t sitting still. We need to go through the painful process of reaping the benefits by reducing cost, redesigning processes, reallocating people and aligning your organization with the benefits that AI can offer. As I wrote earlier, it’s not what AI can do for you; the question is how you redesign your entire organization, business models, products, customer engagement models and ways of working to align with digitalization, meaning software, data and AI. This is the only way to capture the potential of AI to the full extent and the only way that you stay competitive in the long run.

In the startup community, large companies are often referred to as dinosaurs, ie slow, set in their ways and consequently ripe for disruption. Don’t be a dinosaur!

Don’t be like everyone else

This week, I had a wonderful conversation with the CEO of a midsized company (around 1,000 employees) to discuss business strategy and the implications on technology strategy in the overall context of digitalization. As the company supports its customers with digital solutions, it’s an example of the part of the economy that’s doing really well under the current circumstances. It’s a good reminder of the fact that it’s not so much that the economy is cratering, but rather that there are quite fundamental and accelerated shifts towards digitalization taking place in it. It’s just that news outlets prefer to talk about bad news (companies going out of business) instead of good news (the business of some companies is booming) because bad news sells more ads (if it bleeds, it leads).

The discussion with this CEO focused on the positioning of the company. It has much smaller competitors, as well as those that are (much) bigger and the question becomes how to differentiate your organization from these competitors. The simple answer is to do what they do but better or cheaper. However, as Einstein so eloquently said, for every problem, there’s an answer that’s simple, elegant and wrong.

The slightly less simplistic answer is to focus on one of the corners of the competitive triangle (customer intimacy, technology leadership or operational excellence) and organize your company based on that. Again, this perspective isn’t necessarily wrong, but it fails to give guidance as the question then becomes when to use what strategy.

'Commodity, differentiating and innovative functionality each require a different strategy'

In an earlier post, I introduced the three-layer product model where the functionality in a product, a platform or a product portfolio is organized into a layer of commodity functionality, a layer of differentiating functionality and a layer of innovative and experimental functionality. In our discussion, I realized that each of these layers requires a different strategy.

For commodity functionality, the focus should be on operational excellence as you’re looking to reduce the total cost of ownership for that layer to the minimum possible. This demands that you limit the alternative systems to deliver this functionality to the lowest possible number, preferably one. I still meet companies that have multiple solutions for the same commodity functionality and that can’t find the prioritization to reduce the number of alternatives and consequently continue to have outsized associated costs. In general, the goal should be to centralize, standardize and prepare for outsourcing the delivery of commodity functionality.

The differentiating functionality needs an alternative strategy: customer intimacy. This functionality is the key reason customers pick us over competitors and consequently, we need to work closely with customers to maximize the value we deliver to them. Here, the introduction of variants may well be justified as long as we’re able to monetize our efforts. At some point, what’s differentiating now will start to commoditize and then the rules of the game change to what we described above.

Finally, for the innovation and experimentation layer, the key strategy should be technology and product leadership. This is where we explore new innovations, which often are technology driven and which hopefully form our future differentiation. The success metric here is the number of things we can try out against our, often limited, budget. And if I say “try out,” I mean of course to evaluate ideas with customers. It’s too easy to get hung up in our own set of beliefs. Instead, work with customers and observe. Customers will never ask you for an innovation (and if they do, you’re in bigger trouble than you think) but will use what’s valuable to them.

Back to the discussion with the CEO: we concluded that it’s easy to look at our competitors, typically the larger ones, and consider copying what they’re doing, which typically focuses on standardizing and preparing for scaling. Or, to look at smaller competitors and focus on agility and customer intimacy. Although it’s perfectly alright to be inspired by what others are doing and to “steal with pride,” as leaders it remains your key responsibility to define a business strategy that’s uniquely different from the others in the industry. Being like everyone else lands you in a red ocean where cost and slim margins are the only things you can think about. Instead, be different in a way that matters to customers, find your blue ocean and build a great business. And, to quote Steve Jobs, if you haven’t found it yet, keep looking!

Don’t let your habits define you

This week, I had a meeting with the leadership team of a company that has asked me for help to accelerate their growth. We’ve been reconvening regularly and going through the process of defining who we are and what our purpose is as a business, identified the key avenues to accelerate growth, created a plan to execute on and operating mechanisms to follow up.

The weird thing is that we’ve been consistently running behind the plan in terms of execution and when I pointed this out to them, I got the usual excuses of internal dependencies, external factors outside the control of the team and so on. However, at the core, something else was going on. The team has been working together for more than a decade, during which the company went through some difficult times that resulted in their having become extremely careful and risk avoidant. Over the years, they’ve developed a set of habits that ensure wide safety margins. For instance, any new hiring only takes place after the revenue from customers for the new hire has been guaranteed for a long time to come.

The surprising thing is that these habits might have been useful at some point in the past, but at this stage where the company has raised a good chunk of funding, there’s no reason to be avoiding financial and business risk. Instead, with the whole COVID-19 situation, now is the time to invest and expand the team with great talent that’s now available because of many companies scaling back.

Not only are the current set of habits counterproductive for what we’re looking to achieve. The team even fully recognizes and admits that this is the case. And yet, as individuals and as a team, they struggle to let go of their habits and old ways of working.

This example is an instance of normal human behavior. Even though we tend to think of ourselves as rational beings that are occasionally bothered by these pesky emotions, the reality is that we’re irrational beings that are, according to some research, for more than 95 percent of the time driven by habits and that have a tendency to post-rationalize our entirely irrational behavior. The brain is a fantastic story-generating machine and most of the time, it’s generating stories explaining to ourselves why we did something.

In many of the companies and teams I work with, I’ve observed the same situation and it’s the leadership team that tends to be at the heart of it. For all the explanations and excuses of why we are in the situation we find ourselves in, basically, it almost always is the leadership team that’s hampering the company’s development and growth. And in the few cases where there really are external factors at play, it still behooves you as a leader to take responsibility anyway as it causes you to shift your mindset from a victim to the protagonist of your own story.

'Is what you’re doing actually the best course of action under the circumstances?'

 

I don’t mean to say that leadership teams of companies that aren’t doing so well need to be universally kicked out and replaced. Instead, I’m asking you, dear reader, to spend more time reflecting on what you do, how you behave, why you believe you do these things and to what extent it might be that you’re post-rationalizing non-constructive behavior. The only way to break out of these situations is by continuously holding up a mirror to yourself and carefully analyzing whether what you’re doing is actually the best course of action under the circumstances. To me, that’s the most effective, or even the only way, to continuously learn, improve and reinvent yourself and your organization.

As Lao Tzu famously said: “Watch your thoughts, they become your words; watch your words, they become your actions; watch your actions, they become your habits; watch your habits, they become your character; watch your character, it becomes your destiny.” And I believe that we should all aim for the highest destiny we can accomplish in our lifetimes.

What’s with all the Ops?

DevOps, DataOps, MLOps – the number of different “Ops” combinations seems to have exploded over the last year or so. There are manifestos, meetups, lots of blog posts and research articles about these various approaches.

In order to get clear on terminology, I think it’s good to define what we’re talking about. So, first, DevOps is a set of practices that combines software development (Dev) and information technology operations (Ops) with the aim to shorten the system development life cycle and provide continuous delivery with high software quality (Wikipedia). The intent is to combine agile software development practices with continuous deployment in order to have a constant flow of new functionality and resultant value delivery to customers. Also referred to as continuous deployment, new functionality can be rolled out whenever it’s ready, the effects measured and the feedback used to inform the next (rapid) cycle of development.

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics (Wikipedia). Although this sounds very different from DevOps, in most product companies, it’s tightly interconnected with the products deployed in the field. Consequently, the data analytics is primarily focused on R&D teams that need to know if the intended outcomes of their development efforts are indeed accomplished as part of the continuous deployment pipeline.

Finally, MLOps is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning (or deep learning) life cycle (Wikipedia). Whereas traditionally, data scientists would develop a model based on a data set and then move on with their lives, currently in many systems, ML/DL models are constantly evolving due to changes to the data or new algorithmic insights and need to be deployed frequently as well. Once deployed, they need to be monitored to ensure that models that perform better in training also perform better during operations.

'Dev, Data and ML all have to integrate with the same Ops'

In an earlier column, I presented the HoliDev model (see figure). Each of the “Ops” matches with one of the three types of development that’s ongoing. The surprising thing, of course, is that “Ops” for all these stands for “operations” and the key is to remember that for any system, product or solution, there’s only one operations function taking care of it. So, Dev, Data and ML all have to integrate with the same Ops.

The HoliDev model

Concluding, whatever “Ops” you’re working on, it all has to come together in the same operations and consequently, you’ll need to work in [cross-functional teams](https://bits-chips.nl/artikel/focus-on-outcomes-for-cross-functional-teams/) to ensure that you’re reaching the desired outcomes. The important takeaway is, though, that if an activity delivers value to customers, it deserves being done often. Only unimportant tasks are done yearly or even less frequently. So, reflect on this: for all the value-adding activities and processes in your organization, how can you increase the cycle time and create your own “Ops” setup where it matters the most?

AI engineering part 2: data versioning and dependency management

In my last column, I presented our research agenda for AI engineering. This time, we’re going to focus on one of the topics on that agenda, ie data versioning and dependency management. Even though the big data era has been with us for over a decade now, many of the companies that we work with are still struggling with their data pipelines, data lakes and data warehouses.

As we mostly work with the embedded systems industry in the B2B space, one of the first challenges many companies struggle with is access to data and ownership issues. As I discussed in an [earlier column](https://bits-chips.nl/artikel/get-your-data-out-of-the-gray-zone/), the key thing is that rather than allowing your data to exist in some kind of grey zone where it’s unclear who owns what, it’s critical to address questions around access, usage and ownership of data between your customers and your company. And of course, we need to be clear and transparent on the use of the data, as well as how the data is anonymized and aggregated before being shared with others.

The second challenge in this space is associated with the increasing use of DevOps. As data generation is much less mature as a technology than, for instance, API management in software, teams tend to make rather ad-hoc changes to the way log data is generated as they believe they’re the only consumers of the data and it’s only being used by them to evaluate the behavior of the functionality that the team is working on. Consequently, other consumers of the data tend to experience frequent disruptions of the data stream, as well as its content.

The frequent changes to data formats and ways of generation is especially challenging for machine learning (ML) applications as the performance of the ML models is highly dependent on the quality of the data. So, changes to the data can cause unexpected degradations of performance. Also, as ML models tend to be very data hungry, we typically want to use large data sets for training and, consequently, combine the data from multiple sprints and DevOps deployments into a single training and validation data set. However, if the data generated by each deployment is subtly (or not so subtly) different, that can become challenging.

The third challenge is that data pipelines tend to have implicit dependencies that can unexpectedly surface when implementing changes or improvements. Consumers of data streams can suddenly be switched off and as there typically is a significant business criticality associated with the functionality implemented by the consumer, this easily leads to firefighting actions to get the consumer of the data back online. However, even if this may be a nice endorphin kick for the cowboys in the organization, the fact of the matter is that we shouldn’t have experienced these kinds of problems, to begin with. Instead, the parties generating, processing and consuming data need to be properly governed and the evolution of the pipeline and its contents should be coordinated among the affected players.

'We’re working on a domain-specific language to model data pipelines'

These are just some of the challenges associated with data management. In earlier research, we’ve provided a comprehensive overview of the data management challenges. In our current research, we’re working on a domain-specific language to model data pipelines, including the processing and storage nodes, as well as their mutual connectors. The long-term goal is to be able to generate operational pipelines that include monitoring solutions that can detect the absence of data streams, even in case of batch delivery of data, as well as a host of other deviations.

In addition, we’ve worked on a “data linter” solution that can warn when the content of the data changes, ranging from simple changes such as missing or out-of-range data to more complicated ones such as shifting statistical distributions over time. The solution can warn, reject data and trigger mitigation strategies that address the problems with the data without interrupting the operations. Please contact me if you’d like to learn more.

Concluding, data management, including versioning and dependencies, is a surprisingly complicated topic that many companies haven’t yet wrestled to the ground. The difference in maturity between the way we deal with software and with data is simply staggering, especially in embedded systems companies where data traditionally was only used for defect management and quality assurance. In our research, we work with companies to make a step function change to the way data is collected, processed, stored, managed and exploited. As data is the new oil, according to some, it’s critical to take it as seriously as any other asset that you have available in your business.