The concept of a Data Exchange can be confusing as it can serve multiple purposes. It can be a platform for finding datasets via a directory or catalog, or a platform for sharing data assets, or even a destination for publishing data or data products.
This article explores how a Data Exchange can provide benefits across the organization and the role it plays in a data-driven culture.
Most large organizations store vast amounts of data – a few petabytes on average, or the equivalent of approximately 500+ billion pages or more of text. This makes data the most expansive and valuable asset in the organization.
Given the increased focus and importance of data, finding data effectively is crucial to ensure that it is not only leveraged appropriately but that it is also democratized to maximize the inherent value of data to the organization.
The functions of a Data Exchange can be divided into two areas. One is how a Data Exchange interacts with the greater data ecosystem as a system that gathers rich metadata about information assets under management and makes those assets discoverable and more accessible.
The second area of interaction is with those personas (data workers) that leverage a Data Exchange as part of their role in the organization because it democratizes data through discovery and reuse.
A Data Exchange also works hand-in-hand with the firm’s ability to monetize data for both internal and, in some cases, external consumption. Sometimes referred to as a “data marketplace" or a “data share,” its purpose is to provide a comprehensive view of the data assets owned by the organization.
A comprehensive view must include the human–relatable content that makes data truly understandable to the human involved. Additional context beyond raw schema information and hosting system information should include:
Business context about the data. What is it used for in the firm?
Owner/Steward information that helps users ask questions or request deeper access.
Transparency is key, and trust and the overall quality of the data should be provided. Some of the data has been mastered and is highly governed. Other data may be user-provided and experimental, but that does not lessen its potential value.
Lineage and connectivity help users determine how the data has been curated and in which the hosting systems.
Methods of connectivity to help users understand various ways the data can be accessed. Some data is unstructured such as reports and may require access to a BI platform. Other data is accessible through a SQL query or an API.
An exchange is a key measure that portrays the maturity of a data-centric organization. There are several core components that enable the unlocking of a Data Exchange, including:
1. Imbibing agility in business
Both data producers and consumers adapt to changes in how they interact with the data. Producers not only think about bringing the data for their relevant line of business but also start thinking about how to open source the data for which they do not have the funding to gain value.
Consumers, on the other hand, can find what already exists in their ecosystem that they can utilize before they go out to build a bespoke dataset or engage a vendor.
2. Boosting data literacy
One of the major goals for any data-centric organization is to enable data literacy across the business. What is the dataset, where can I find it, and how can I use it? Can I build my report on top of this data? These are some of the questions that can be answered by the Data Exchange.
Moreover, if there is a community of experts in data analytics, adding new datasets or sharing updates on existing datasets (like product demos) can encourage businesses to think collaboratively about problems and solutions.
3. Approaching data as a shareable asset
Data should be an asset, shareable by design as an asset. The resource can exist in silos but once we tag a data product as a data asset, it implies that this is produced with shareable attributes in mind.
This also implies that data shared on the exchange has the right governance such as entitlements, security, and standardized contracts built into it. Any data asset that is visible through a contract includes built-in reliability, such as operational uptime.
4. Enhancing operational and cost efficiency
Having a Data Exchange and being able to serve consumers from a central location prevents duplication of datasets and saves cost. It also supports having fewer, more meaningful datasets, and less team ownership making it easier to find correct owners and raise change requests. It can provide easy lineage and make it easier to find data anomalies and notify downstream consumers.
A few important elements to keep in mind
We did mention that a Data Exchange provides tremendous value when the company is on its way to being mature enough to be a data-centric organization. Now that we know the benefits unlocked via a Data Exchange, there are a few elements to keep in mind:
An exchange creates its place by providing the value of all the components of the modern data ecosystem, including data lineage, quality, observability, data catalogs, security, entitlements, and governance. Not all are required. However, a mix and match of those capabilities should ideally exist to provide true value to the business via a Data Exchange.
Down the line, I will provide more articles on each capability individually and how modern data platforms should complement each other with capabilities such as Data Exchanges to promote applications to serve businesses and help build data communities.
About the Author:
Usman Zafar, a Data Architect, has spent over 15 years shaping the data landscape with his expertise in various firms across the Finance and eCommerce industries. Currently, he serves as a principal data architect with Invesco, leading the data platform architecture practice globally within the organization.
Prior to this role, Zafar held various positions in the data domain, including building and leading teams in creating data initiatives for storing large Petabyte-scale data and developing low-latency data applications.
With a track record of successfully leading cross-functional teams, he has helped companies unlock the true potential of their data by implementing best practices in data quality, data architecture, and modern data engineering. He is passionate about data movement and implementing green architecture practices in data storage, data platforms, and governance.