While data fabric and data mesh have grown in popularity, they have also encountered their fair share of skepticism. Hijacked by marketing departments and used as buzzwords, it can be hard to get a clear understanding of what they can do and how to benefit from them.
Too often, when asked what data fabric or mesh is, software vendors shape their answer by the current functionality their products offer which unfortunately misses the mark when it comes to clarification. The truth is, as frameworks that need to span multiple environments, no one tool can cover the full set of requirements alone.
As data professionals, we are familiar with data modeling. A good concept in data modeling can encapsulate lots of thinking. So if you describe a concept well, it can present an incredibly powerful opportunity.
Data fabric and data mesh are no different – if well-defined and their value and application are well communicated, we can use them to considerable advantage.
Data mesh as defined by Zhamak Dheghani (formerly of Thoughtworks) can be summarised as applying product-based thinking around data as a valuable asset. It often requires organizational change, and the owners of the data domains need to be established to define who gets the value from the data and notably who owns the data risks within it.
Re-usable data assets are not new, but data mesh gives a stronger framework to deliver them within.
In parallel, while data fabric is currently riding high on the Gartner hype-cycle, it is well positioned to move through the ‘trough of disillusionment’ as it contains several well-established data capabilities.
At BT Group we have established a simple definition:
Data mesh definition: It is a “decentralized framework for managing data as valuable re-usable products”
Data fabric definition: For the previously ‘enigmatic' data fabric, it is simply “smart, automated data management at scale”
Let’s break down the significance of each of those keywords for data fabric:
Smart – Use of knowledge models and graphs to gain a semantic understanding of data and its context
Automated – Consistent and auditable automation of common data governance activities such as metadata, data handling policies, and controls using machine learning, AI, and repeatable patterns
Data management – The practice of managing the risks and the value of the data within a business (establishing ownership, policies, standards, controls, improving data quality and data value).
It should go without saying that to smartly automate anything you have to have a certain level of maturity in your understanding of what it is. We measure our data management maturity using the EDM Council’s DCAM (Data Management Capability Assessment Model) framework.
At scale – An order of magnitude of data usually considered within or above the petabyte, used here to indicate that there is more data in scope than could be effectively handled by a manual “data-steward centric” approach.
A manual approach often requires a heavy investment in time and money with people tagging, reviewing, and approving data, requirements, and controls. It introduces human error, inconsistency in interpretation, duplication, and unnecessary costs compared to a more automated and metadata-driven approach.
To begin to see where this can be implemented within current organizational structures, it is important to start with a business motivation model and for to consider business capabilities, value streams, and how data plays a role in those.
These business capabilities are the basic building blocks of your business architecture, for example, the “Management of Orders”, “Management of Payroll,” etc.
Next, look at this enables critical data to be identified and linked to the business capabilities and then managed to support continued value and reduced risk.
For many financial institutions this starts with the focusing on the regulatory need to manage data (sometimes called defensive data management) but to expand this to get data value, this needs to be done at a far greater scale – something that can only be done with “ driven automation”.
Data fabric sits in the sweet spot between all three disciplines (Business Architecture, Data Management, and Data Architecture) and drives them closer together, a structure we use and benefit from.
Whereas, there is a natural tendency to hoard data in areas that produce it. Modern data practices and platforms have evolved to bring the curation to a domain-level concept, where the data is well-described, has quality and owners, and is interoperable.
A mesh can include many data products for example product inventory, customer, customer orders, household info, contracts or IoT (Internet of Things), and many more.
Data scientists are now able to pick data products and combine them to deliver faster time to value. You can liken data products to ingredients that are ready to be combined to create meals from many different recipes.
With data mesh, you can assign ownership and reward the producers of data so they can be adequately funded to continue creating data value for their organization.
Federating ownership without unintentionally creating data silos through differing standards can be challenging, so this is where data fabric helps and stops a data mesh from becoming a data mess.
As previously mentioned, where data fabric is ‘smart and automated data management at scale,’ it sets these rules for you. It gives structure and boundaries for creativity within domains and, most importantly, allows the enterprise or organization to benefit.
The foundation of good data fabric is information modeling, making sure that domains and data products make sense together, are well-defined, and do not overlap.
There also needs to be an understanding of the data and the sources they come from as well as the use and implementation of a marketplace, that can check the credentials of those wishing to access it.
We are expanding our data mesh and data fabric, and this is being done at speed. We are doing this by establishing a common vision and creating a movement within our organization using metrics and recognizing what success looks like. We recognize the important part data can play when it comes to accelerating the digital transformation of our business.
To help illustrate some of the advantages of this approach, let us look at a generic order management capability.
1. Identify the primary business capability
In this instance a customer places an order for a product (for example, a new mobile phone) and for simplicity’s sake let us say they are keeping their existing contract and services.
It may be helpful to group that under the generic business capability “Management of Customer Orders” – following Gartner’s best practice in naming conventions (“verb” then “noun” which enables an obvious distinction between the department name and the activity).
2. Identify the corresponding data domains/data products (in bold)
At the highest level that requires knowledge of who the customer is (customer), what their delivery address is (customer address), what their payment status is against the order (payment/billing), what the availability is of the exact product they ordered, and its location in relation to the customer address (product inventory and corporate locations). These could all be considered data domains with their own data products (in most cases, produced and managed by quite discrete teams).
3. Identify the business capabilities producing or consuming that data
It then becomes necessary to identify where data is being produced and consumed for example…
The customer and customer address information itself may well have originated from other business capabilities and their corresponding CRM (Customer Relationship Management) systems.
The product inventory and corporate location information may have originated from other business capabilities such as Management of Product Inventory, Management of Supply Chain, Management of Corporate Locations, and more.
The order data produced may be required by the “Management of fulfillment” business capability and will surely be needed to update the product inventory as the order is fulfilled.
4. Design your data domains/products
The more business capabilities you model in this way, the more you see recurring sets of data. The vast majority of data sets across an organization can be re-used to great effect. This allows you to prioritize the most reused groupings of data into data products.
This gives a great economy of scale (for example, you can focus the necessary controls and quality at the source and propagate from there across the organization) rather than having to, for example, fix data in numerous places.
5. Identify the necessary data controls and their business rules
Each data product will have various controls, to take a couple of simple examples:
Customer and customer addresses will have by definition restrictions based on Personally Identifiable Information (PII) covered under many forms of regulation which may need to be tokenized in certain situations.
Equally depending on the location there may be restrictions on cross-border data transfer of this information, archiving requirements
Payment/billing may have restrictions based on (PCI – Payment Card Industry) Standards but may also contain sensitive information about credit ratings that equally need to be treated carefully.
There may also be controls on who is allowed to see what data for what purpose.
All these controls can be encoded as business rules requiring specific metadata (i.e. the rule applies to PII so it needs to know what data is considered PII). They can not only be applied at the source but also where the rule requires it, at every place in the organization where the type of data is present.
6. Detecting and tagging the data (aka Semantic Discovery)
Using smart technologies, it is possible to detect the type of data and tag it (i.e. identify something is PII). This approach can be simplistic (like the shape of data telling you that it is a social security number) to more sophisticated methods such as fingerprinting (where the values of the data have been recognized as being similar to previously tagged data sets) or contextual detection (where proximity to other recognized data can help confirm its nature).
7. Applying the rules and making the data available.
Data Fabric can apply the controls and business rules to the data that it has detected and due to rules applying everywhere the type of data is present this can apply at a great scale.
The semantic nature of the way that the data has been organised and tagged allows it to be easy to identify and locate within the enterprise and then apply for access (applying the business rules related to access).
About the author:
Ben Clinch has been working in business and data architecture for the last 23 years and is currently Principal Enterprise Architect, Information Architecture for BT Group. Clinch plays an instrumental role in BT Group’s data fabric vision, automating data management, standardizing policies, and maximizing the efficiency of its data estate.
Clinch is also a member of the EDM Council's Cloud Data Management Capabilities (CDMC) steering committee and working group. CDMC is a Financial Services industry initiative to develop a common, open-source framework of cloud data management capabilities, standards, and best practices.