(US and Canada) Philip Dutton, Co-CEO and Co-Founder of Solidatus, speaks with Baz Khuti, President US Modak, about the difference between data and metadata, data lineage, and its role in the cloud space.
Dutton uses the Library of Congress as an example of the difference between data and metadata. Considering that the library has 164 million books as data, he refers to the index that contains information on the whereabouts of each book as metadata. It describes where each book is as well as what each is about. In other words, metadata gives organizations the ability to find and understand the data that one is looking for.
Regarding the role that data lineage plays in the data-metadata context, Dutton considers data lineage as the connecting link between data sources. He further explains that when a person applies for a credit card, the personal information is stored in the customer database. Once the transaction is made using that credit card, it ends up in a transaction database, which is then assembled and forwarded to the regulator to finance. The linkage between different databases is data-lineage. It is the connectivity that helps to understand data whereabouts and its impact on an organization.
Looking back, Dutton says it used to be only about the physical flow of data in an organization. Now, however, there is a cardinal shift toward connection. Whether it is connecting to a physical database, aligning a business, or being data literate, lineage starts to unlock more business value in the organization.
He pronounces data lineage as the blueprint portraying organizational framework. He further acknowledges that technology and data have aided expeditious organizational growth. Dutton envisions the challenge of being in a cloud-native, quantum computing scenario that is ever-evolving. He considers the cloud as the greenfield opportunity to build better things by default and by design, and urges organizations to understand the “data first” approach to moving to the cloud.
Dutton further asserts that concerning GDPR, data lineage has become increasingly important in the cloud space. Data lineage answers all the what(s), who(s), and where(s) of data in the cloud. He concludes that not having complete data lineage across cloud platforms and on-premises will lead to troublesome hidden data.