Data Lineage 101: What’s so special about data lineage?
During the past few years, I have noticed growing interest in data lineage. Data lineage has become a hot topic in the data management community.
Briefly, what is data lineage?
You will learn in part two that there is not a consistent definition of Data lineage, but the following is enough to get you through part one of this series. Data lineage is a path data flows from its origin to its destination. You describe data flow/ lineage using different components such as business processes required for data processing, applications, etc. Data lineage can help with efforts to analyze how information is used and to track key bits of information that serve a particular purpose.1
Why has data lineage become a hot topic?
Ask yourself: who in your (or any other) company might be interested in data lineage and why? Years ago, only IT professionals knew what it was, but now business stakeholders, especially those from finance and risk have become the biggest data lineage enthusiasts.
Why this sudden interest in data lineage? There are be a number of reasons:
- appearance of new legislation requirements
- business changes
- an increase in data quality initiatives
- supervisor and audit requirements
Let’s talk about these in more detail.
My professional journey to data lineage started with the investigation of requirements of the Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR). Later on, the EU General Data Protection Regulation (GDPR) , IFRS9, TRIM (Targeted review of internal models) and others came into the picture. Many specialists consider data lineage as the ultimate ‘remedy’ to meet the requirements included in these regulations.
The funny thing is you never find the term ‘data lineage’ literally mentioned in these regulatory documents.
All conclusions about the necessity of data lineage are based on careful investigation of legislative requirements and matching of these requirements to the data managementterminology, with data lineage being part of it.
Very often, a company deals with different types of business changes, such as changes in information needs and requirements, changes in application landscape, organizational changes etc. As an example, let us consider a change in a database of a business application. Usually, data is transformed and processed through the chain of applications, as you can see in Figure 1:
Figure 1. A chain of applications
For convenience, the chain consists of just a few applications, but in reality, especially in large companies, such chains consist of dozens of applications.
Let’s assume the database of one of the applications is changed, for example in ‘Company web-page’ (the starting point of the chain on the left side of the Figure 1). It means that professionals will need to estimate all required changes in the consequent applications, including the impact on the end reports and/or dashboards. In this case, data lineage will be able to ease the impact analysis of the change.
If changes touch, for example, information & reporting requirements (the end point of the chain in Figure 1), professionals will need to use root-cause analysis that will allow them to assess which data is required to produce this new information, where data should come from and how it should be transformed. In such a case, a root-cause analysis will be much easier to do if data lineage is already recorded.
Knowledge about data processing is often kept in the minds of professionals or in the best-case scenario, on local computers in the form of Word or Excel documents.
Nowadays there are a lot of initiatives around the quality of data. In large international companies it could take years to roll out such a program and it would take even more time and effort to make it a success. It is not widely known that data lineage is a key condition in the resolution of data quality issues.
Supervisory and audit requirements
Last but not least, let’s talk about supervisory and audit requirements. There is a growing tendency that in addition to aggregated reports, supervisors require companies to provide granular reporting data. This is especially the case in the finance and risk functions that are required to explain how critical metrics and figures in their reports have been derived. For that, you need to be able to trace back the full chain of data transformation and explain its path. To be able to do that, you definitely need to know data lineage.
Now that you have been introduced to data lineage, I’ll use the next four parts in this series to help you dive into this topic. In part 2, Data Lineage 102: Definition and Key Components, we’ll take a deeper dive into ‘What IS data lineage?’, including some of the challenges it presents.
Not a member-scholar yet? Join our financial community here!
Identify your path to CFO success by taking our CFO Readiness Assessmentᵀᴹ.
For the most up to date and relevant accounting, finance, treasury and leadership headlines all in one place subscribe to The Balanced Digest.