Skip to content

Kensu + Azure Data Factory: A Technical Deep Dive

With 38% of data teams spending between 20% and 40% of their time fixing data pipelines¹, delivering reliable data to end users can be an expensive activity for data teams. With Kensu’s latest integration with Azure Data Factory, ADF users now benefit from the ability to observe data within their Azure Data Factory pipelines and receive valuable insights into data lineage, schema changes, and data quality metrics. Ultimately, data teams are equipped to handle data issues, stop them from propagating, and provide reliable data to end users. Through the integration with Kensu, Azure Data Factory activities are made data observable, and data teams can obtain a comprehensive view of their data pipelines to identify data incidents. 

 

How does the Kensu + Azure Data Factory Integration operate?

The Kensu collector for Azure Data Factory functions using a series of four steps to collect, process, and aggregate the necessary data for observability and governance. 

 

Kensu + ADF Technical - visual for blog

 

The process begins with "Data Retrieval". This involves the Kensu Collector retrieving the latest Azure Data Factory activity data at regular intervals, ensuring observations are up to date. This continuous process captures the most recent changes in users’ Azure Data Factory pipelines.

The process continues with the second step, "JSON Processing''. It is during this stage that the Kensu Collector processes the JSON response and seamlessly integrates with Azure Data Factory, leveraging the Azure Data Factory Python SDK to extract relevant information such as pipeline name, activity name, environment, and timestamp. Parsing and structuring the JSON data for further analysis are key actions during this processing, as they form the ultimate contextual information necessary for efficient observability and troubleshooting data reliability incidents.

Once the JSON Processing is complete, the third step, "Microsoft MS SQL Querying,” begins. For each Microsoft MS SQL data source identified in the activity run data, the collector queries the corresponding Microsoft MSSQL table to retrieve additional metadata regarding schema information, such as column names, data types, and a set of metrics. The observations collected may include metrics such as missing values, number of rows, and distributions (e.g., mean, min, max) for numerical columns. By fetching this additional information from Microsoft MS SQL, the Kensu Collector enriches the observations with valuable insights about the underlying data. This step allows the collection of performance metrics used to validate the data based on Kensu rules.

The final step, "Aggregation and Sending," commences after retrieving the necessary data from both the Azure Data Factory API and Microsoft MS SQL. During this step, the collector aggregates and processes the collected information. It organizes the observations and aligns them with the appropriate job runs, data sources, and associated metadata. The collector sends the aggregated data to the Kensu core, where it can be utilized for comprehensive data observability, governance, and analysis.

 

The Kensu Circuit Breaker

Let’s say Azure Data Factory processes an activity before sending its data to several pipelines. If the initial activity encounters an incident, the flawed data will cascade across all the downstream pipelines, raising the risk for a single incident to spread and adversely impact end users.

Such a risk exists for most data teams, given that only 7% could identify data issues before they impact users². To eliminate potential for data problems, data teams can leverage the Rules in Kensu in conjunction with the Circuit Breaker. In the example above, this powerful combination stops the initial pipeline execution if data reliability standards are not met, acting as a proactive protection against propagating inaccurate or incomplete data downstream. This renders data teams the opportunity to resolve the problem by leveraging the insights provided by the Kensu platform.

The Kensu Circuit Breaker is a component that data teams can seamlessly import into an Azure Data Factory environment and incorporate into any desired pipeline. Once in place, the Kensu interface prevents activities within the pipeline to run once upstream activities have been identified with a problem. 

 

Boost data team productivity with Kensu + Azure Data Factory.

Azure Data Factory and Kensu users can supply data to end users with the confidence that its integrity is intact. With this integration in place, they can effortlessly fetch data observation activity run information directly from Azure Data Factory for all their Azure Data Factory pipelines. Viewers may use these contextualized insights to discern which pipelines consume particular inputs and produce specific outputs as well as the ability to receive and compute statistical metrics on the data sources.

This wealth of information provides the full story of the data movement—by acquiring a deeper understanding of data usage and incidents. Data teams may more efficiently complete root cause and impact analysis when incidents arise to prevent data incidents from cascading and eliminating frustrations that arise when faced with broken data. 

You can discover its capabilities with our team by booking a demo on our website. 

Sources: The State of Data Observability 2023.