When explaining what Data Observability Driven Development (DODD) is and why it should be a best practice in any data ecosystem, using food traceability as an analogy can be helpful.
The purpose of food traceability is to be able to know exactly where food products or ingredients came from and what their state is at each moment in the supply chain. It is a standard practice in many countries, and it applies to almost every type of food product.
Similarly, the purpose of DODD is to give everyone involved in the data chain visibility into the data and data applications so that everyone involved in the data value chain has the ability to identify changes to the data or data applications at every step—from ingestion to transformation to analysis—to help troubleshoot or prevent data issues.
The other value of food traceability is that it helps to guarantee the quality of the food for consumers and the reputation of the brand. This is similar to what DODD also tries to achieve: building confidence in the data and the data producer, through consistent monitoring and logging of the data and the data pipelines.
How it works
One of the first applications of food traceability dates to the 1930s, with an effort to prove a bottle of French champagne was made from grapes from the Champagne region of France. Since then, food traceability has expanded to encompass a global market, and it is growing in complexity—just as data is also increasing in volume and complexity.
Food traceability involves tracking and documenting the production, processing and distribution chain of food products and ingredients. While barcodes have long been the standard for tracing products throughout the food and beverage supply chain, technological advancements have introduced RFID as another option.
RFID has improved traceability because multiple RFID chips can be read simultaneously versus barcodes. Again, this mirrors what we see in many data systems, where streaming data is generated continuously to produce real-time insights or control applications.
There are three key principles that must be followed in order to create effective and sustainable end-to-end real-time tracking.
1. Synchronized observability
Synchronized observability occurs at the moment of transformation, or in the case of food, at the moment of transportation or transition. This means that data is collected on the food product at the time when it enters or leaves a specific environment, such as when it arrives at the warehouse from the farm and when it leaves the warehouse for the grocery store.
Likewise, DODD must be performed at the exact moment of data use to avoid any lag between monitoring and use. This is the most reliable approach to ensuring the quality of the data consumed and used by applications. It also helps assure that assumptions about the data are still valid.
2. Context observability
Contextual observability simply means being able to observe something in its current environment and to gather data on that environment or “context.” In the case of food traceability, this means that at each stage of the journey—from the farm to all the way to the grocery shelf—you maintain the ability to observe the food product in each environment. As an example, data on the temperature of a milk product would be collected in every “context” or environment the milk passes through, including the farm, the transport truck, the processing plant, again during transport, and at the grocery store. It also means that you are gathering information about the context itself—so you have data on exactly which truck picked up the milk and which production lines were used in the factory to produce a certain batch of products, and so on.
Regarding data, contextual observability should not only provide data teams with information on the data itself, but it must also provide the context of its usages. This includes, when, how, and above all, which applications consume, use, and produce data.
3. Continuous validation
This principle embraces the concept that the quality of the product must be continually validated in every environment. For example, when a food company creates a recipe for pizza dough, they will test it at the lab to ensure that the texture, taste, and quality of the dough are all correct before moving the dough to production. However, when they produce the dough at the factory (i.e., put it in production), they must also test the product again to ensure its quality. It is not enough to rely on the fact that the recipe should produce a quality pizza dough, they need to ensure that in actual production everything went correctly and the dough does in fact meet expected standards of quality.
In the same manner, DODD must be continuously executed during the successive implementation phases of data applications (e.g., development, testing, acceptance, and production) along with the validation of the integrity of the code. Continuous integration (CI) guarantees the quality of the code from the very beginning of the development cycle (until the acceptance phase). Similarly, data applications should be continuously validating (CV) the data even after deployment in production.
Increasingly, regulatory bodies are requiring companies to ascertain food products and ingredients attributes at every stage of the supply chain—from farm to food processing and from retail to the consumer—and often this information needs to be provided in a matter of hours. Likewise, when data issues arise, there is often a critical need to resolve them quickly, and to do so requires being able to identify changes to the data at every phase and application the data has passed through.
The value of implementing food traceability or DODD is that it allows you to have end-to-end visibility and resolve issues quickly. Specific benefits include:
#1 Improved analysis
Food traceability allows food to be traced along the entire supply chain, so it’s easy to identify what’s happening at what stage in the process and how that’s impacting the quality of the food. Likewise, contextual and synchronized observability provides precise information about both the data and the applications using the data.
By having contextuality it is easier to understand what’s occurred and troubleshoot the issues. For example, if a salmonella outbreak occurs, regulators can quickly trace the outbreak back to a specific product, such as romaine lettuce at a specific grocery store. From there, they can use the data collected along the supply chain to more narrowly identify which specific farm this batch of lettuce was sourced from and where else lettuce from this farm went to in order to ensure its speedy removal from other grocery stores before others are sickened.
In the world of data, quick troubleshooting assists in a similar way, by helping organizations avoid data catastrophes (a.k.a datastrophes). For example, data quality rules can detect that a problem has arisen in a dashboard, where some of the values displayed are obviously wrong. The data team can then quickly troubleshoot the problem by conducting a backward analysis of which of the incoming data or applications are the source of the problem. This way they can fix incorrect data in a matter of minutes rather than hours or days.
With the ability to see exactly what is happening in real-time, spoiled or contaminated food can be removed before it even makes it to the shelf. For example, if too high temperatures are detected in a truck transporting milk, the milk can be disposed of instead of sent on to the store.
Likewise, visibility into data usage can highlight a missing field or column or that the data isn’t refreshing in a timely fashion. Data teams can troubleshoot and fix the issue before the end-user uses the data, avoiding any inaccuracies in reports generated from the data.
#4 Stronger involvement and accountability
Continuous validation along the supply chain stimulates shared responsibility—everyone from farmer to factory to supermarket is responsible and held accountable for the quality of the food while it’s in their care.
Continuous observability from within applications also increases the involvement and accountability of the teams’ developing applications. As they implement data observability within the code as they write it, they must not only understand how the data is supposed to be used, but also how its quality must be controlled. This ownership by the development team leads to better coding, faster debugging, and more creativity.
#5 Complete documentation
To understand where a specific food product or ingredient has been, you can simply scan the QR code and see its entire journey in context through the data collected at each stage.
Contextual observability also helps to better document data issues as it provides insights not only about the data, but about the use of the data and the multiple applications processing and producing data. This information is essential because it provides a way to share how the data can be reused.
#6 Higher reliability
Continuous validation significantly improves the reliability of quality. In the case of food traceability, continuous validation ensures that the food product remains in good quality and safe to consume at each stage of its journey through the supply chain.
In the instance of data and data applications, since data teams must validate the quality of data during the various development phases, there is a greater guarantee that the data quality is high and there are no issues, such as missing or inaccurate data.
The ultimate goal of any food traceability or DODD method is to identify problems quickly and mitigate risk. When the right processes are in place, including contextual observability, synchronized observability, and continuous validation, it becomes much easier to achieve these goals.
The case for data observability as a best practice
In the case of food traceability, which has long been regulated by governing bodies like the Food and Drug Administration (FDA) in the U.S. and the European Food Safety Authority (EFSA) in Europe, these three principles—contextual observability, synchronized observability, and continuous validation—have become standard best practices because they ensure both food quality and quick identification of issues when necessary.
Just like a good system of traceability allows all those involved in the food supply chain to meet these requirements because it gives them the visibility into every aspect of the supply chain, Data Observability Driven Development allows companies to gain a similar visibility into the data value chain in order to be able to quickly identify data usage at every phase from ingestion to analysis. Making DODD, and its underlying principles of contextual and synchronized observability, and continuous validation, a best practice in data management will enable companies to avoid datastrophes and speed resolution of data issues.
To learn more about how Data Observability Driven Development can help your business, do not hesitate to contact our team.