Navigating Data Ethics — Create a Data Privacy Reference Architecture with 5 Key Capabilities

Is your organization collecting too much information about users, customers, and employees? Where do we draw the line for what is over-collection of data?
Navigating Data Ethics — Create a Data Privacy Reference Architecture with 5 Key Capabilities

In the year 2023, almost every human on the planet is being digitally tracked to a certain extent. Their movements, interests, habits, and conversations are now watched, analyzed, and stored by thousands of organizations that use them for various purposes — some with consent, others without.

The explosion of analytics, personalization, and predictive models brought on by the tech giants has resulted in the rush to collect and use every minute detail about a person. In many ways — Hollywood had predicted this long back; we are now living it.

We are the first generation who watch the number of steps we take, the calories we eat, and the times we have woken up at night. The fact that we can do it with technology is not as important as the fact that we care about these insights; then we use it to make decisions the next day.

Collecting these pieces of information for the sole purpose of personal usage feels empowering; but companies collecting this information for monetization, advertising, product development, marketing insights, and strategies that are purely for their self-interests can be intrusive. The amount of ‘listening’ by our devices starts to come across more as ‘big brother is watching you.’

All this information coming from the Internet of Things (IoT) can be used in ways that can compromise our security and privacy if not protected and used appropriately. There is a trend of overcollection and lack of protection of personal data that can cause consumers significant harm if used with bad intentions.

Are we collecting too much information about our users, customers, and employees for purposes that surpass what the business needs and start to cross an ethical boundary?

Where do we draw the line for what is over-collection of data in a world of influencers and tiktokers who have created business models based on oversharing personal details like never before?

As is often the case, regulators have observed individuals and businesses diving deep into data sharing/collection and have decided to respond. If we look at this from an organization’s standpoint, they have often approached data with the best of intent – to improve customer experience and increase revenue/cost savings for the organization’s benefit.

Data and insight bring far more value than ‘gut feel’ in today’s world, but organizations that have existed for decades have a data ecosystem that is full of legacy platforms, organically grown systems, and little documentation. The discipline of Data Governance is fairly new in comparison to the use of applications in companies. So, it’s no surprise most organizations are currently struggling with their data landscape.

They have tried several strategies to solve the problem of the volume, variety, and velocity of the data that is required to reach their full business potential. Organizations have invested in warehouses, the data lake, the lakehouse, data fabrics, and every solution in between. With privacy regulations looming in every jurisdiction, it will be incredibly difficult to become regulatory compliant if you don’t have a handle on your data.

We now need modern ways of thinking about the data living in our organizations to meet all the different demands – business value, cost management, and regulatory pressures. This article will explore the value ‘data as a product’ will bring to organizations as we navigate our way through privacy regulations.

The broken data ecosystem of today’s big organizations

Data has always been a ‘by-product’ of process and technology in most large organizations. To generate revenue or cost savings, organizations created processes to handle the transactions – these processes are essentially the handover of information from one party to another to satisfy the completion of a process.  For example, the process of getting a loan at a bank.

The data is collected about the client – their basic personal information, financial details, and the amount of loan required – which then gets processed through a logic that either approves or declines the loan. If approved, an additional process takes the information in the documents provided, makes the loan available to the client, and creates a process to collect the payments with interest.

Every step was created for a specific task, and the data is collected to move that process forward. Technology was brought in to automate these processes without any real vision from a data lens. The process is not on paper but digital forms, and banking systems. We now need to multiply this one example with all the products a bank can offer its clients, all with its unique processes end-to-end and the different mechanisms of collecting, storing, and processing that data.

Due to organizational structures, acquisitions, and simplicity of execution – most of these systems and processes don’t talk to each other and have grown in complexity as the bank has evolved in its offerings. 

The diagram below shows how many systems a person’s PII can flow through for a single transaction. 

Systems a person’s PII can flow through for a single transaction.
Systems a person’s PII can flow through for a single transaction.

A single transaction becomes part of several processes, and Personal Identifying Information (PII) gets distributed to systems, data stores, and reports, resulting in complex lineage and access controls that are distributed across various teams. This has resulted in the following major challenges for any privacy regulations that try to govern data attributes:

Challenges for privacy regulations.
Challenges for privacy regulations.

Another complication is the definition of what is ‘personal data’ evolves regularly as new data about our existence gets created with every new digital invention. The original definition of PII or personal data was related to an individual’s Social Insurance Number (SIN), address, phone number, and other very personal information that would be shared with an organization for a limited purpose.

For advanced analytics and personalized experiences, big tech, and large organizations have collected a lot of additional behavioral data to help understand their users – enabled greatly by their internet usage and smartphone activities. It's safe to say that the amount of personal data generated by smartphones has significantly increased in recent years.

People are now able to generate, store, and share vast amounts of personal information, including photos, videos, location data, text messages, and social media activity through the click of a button. Many apps on smartphones also collect data such as browsing history, search queries, and purchasing habits – sometimes with consent, sometimes not.

There are organizations that use fingerprints and retina scans as part of their process – which is as unique as having your DNA on file. These all become part of ‘personal data’ that now need to be governed but go beyond basic PII. The terrifying thought is that it’s all sitting on top of a very complicated data ecosystem no organization has fully mapped out.

Emergence of privacy regulations

As big organizations embarked on their journey to become ‘data-driven’ and mature their analytics capabilities, they collected all pieces of data about their customers, clients, and employees but did not clean up their current state. The data is scattered around multiple data repositories and processes.

This creates enough of a headache when an organization tries to generate value; but now several major global privacy regulations are emerging across the world, each with their specific requirements and provisions to reflect the country/region’s sentiment towards privacy.

The General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and Canada's Digital Charter Implementation ACT (C-27) are all privacy regulations that aim to protect the personal data of individuals. If we compare the three major regulations, at the heart of all these regulations are key requirements that service the growing anxiety consumers and employees have with the collection and usage of their data by merchants. If we compare the critical themes, we see there are some common ones:

Comparing the three major regulations GDPR, CCPA, and C-27(CAN)
Comparing the three major regulations GDPR, CCPA, and C-27(CAN)

Privacy laws/regulations are difficult to implement even in a structured environment. It requires an organization’s Legal, Business, and Technology teams to partner closely together to create a new perspective on the collection, usage, and storage of all personal data collected about users, consumers, customers/clients, and employees. Each function plays a different role in setting up the right operating model for maintaining a high bar on privacy.

Legal/Compliance: The Legal and Compliance teams of an organization are responsible for ensuring compliance with privacy regulations, including interpreting the regulations and advising the organization on how to comply with them. This can include developing and implementing privacy policies and procedures, conducting regular privacy risk assessments, and providing training and education to employees on privacy-related matters.

Business: The business team of an organization is responsible for implementing the policies and procedures developed by the legal team to ensure compliance with privacy regulations. This can include incorporating privacy considerations into the design and development of products and services, implementing controls to protect personal data, and regularly reviewing and updating privacy-related business processes.

Technology: The technology team of an organization is responsible for implementing and maintaining the technical controls necessary to comply with privacy regulations. This can include developing and implementing security measures to protect personal data, such as encryption and access controls, monitoring systems for unauthorized access or breaches, and regularly testing and updating the organization's security infrastructure.

For large global organizations, prioritizing the common requirements and building them as shared capabilities across regions and businesses will ensure lower development costs and strategic reusable services.

How to approach privacy regulations and ensure compliance?

Every organization will have its unique challenges to solve for privacy compliance depending on the data it collects, stores, and uses. Maturity of the data infrastructure and data governance practice and its willingness to evolve its perception of its data.

The initial work around privacy regulations must be around building a culture that prioritizes privacy as a key component of business processes and system design. Over the last few decades, information security has become a critical part of most organizations, and companies prioritize necessary steps to avoid data loss or security breaches.

Protecting the privacy of individuals must become a similar priority in every process that manages personal information – manually or digitally. This will not be accomplished quickly, as it takes time to understand the various categories of sensitive personal information. Some are easy, such as a person’s SIN, while others such as business email depend on context.

There is enough ambiguity in the data that is associated with an individual that the organization will need to define its criteria partnering closely with legal, risk, and compliance to build the culture around protecting what is necessary.

Building the culture around protecting data. Fig 1
Building the culture around protecting data. Fig 1
Building the culture around protecting data. Fig 2
Building the culture around protecting data. Fig 2

A critical capability that organizations will need to build is a dedicated privacy team that plays a similar role as the information security team, where they establish the culture, processes and provide an objective perspective to others. This team will champion the requirements highlighted in the chart and ensure it is implemented successfully.

How to create a data privacy reference architecture with 5 key capabilities?

Below is a reference architecture for how organizations need to think through capabilities that need to be brought into their data ecosystem to protect personal data. There are 5 key capabilities that need to be built – some are systems while others are operating models that will enable an organization to handle privacy optimally.

  1. Privacy portal: A privacy portal is a platform that is designed to provide individuals with a centralized and accessible place to manage their personal data and privacy rights. It gives individuals more control over their personal information and how it is collected, processed, and shared by organizations.

    The privacy portal typically allows individuals to access, review, and update their personal information, request deletion of their data, and exercise their privacy rights such as the right to opt out of data sharing.  The portal needs to be integrated with the organization's systems and processes to allow for current state views and requests from the individual.

  2. Policy management: A policy management platform for privacy regulations is a software solution designed to help organizations manage and comply with privacy regulations. The platform is used to automate and streamline the process of creating, maintaining, and enforcing privacy policies and procedures, ensuring that the organization remains compliant with the relevant privacy regulations.

    The policy management platform should incorporate the following capabilities:  Policy creation and management, and Privacy risk assessments and Incident response management.

  3. Consent management: A consent management platform is a software solution that helps organizations comply with privacy regulations. It provides a centralized platform for managing user consents, collecting, and storing consents, and ensuring downstream systems are complying with user consents.

  4. People data repository: A people repository is a database that stores information about individuals and their personal data. A people repository solution refers to a database that stores and manages master data related to individuals. It contains key information that defines an entity, such as a person, and includes data such as name, address, email, and other identifying information and is maintained in accordance with privacy regulations and consents.

  5. Data governance capabilities: To protect and manage privacy requirements, data governance capabilities must mature. Data governance capabilities are often well-understood and poorly implemented due to the complexity of the current state. To make sure privacy regulations are met proactively the following must be implemented for all data domains with personal data:

  • Data security and access management – The most critical capability for PII, due to regulations now becoming very precise as to who and when someone can view PII data. Role-based Access Management is no longer acceptable and organizations need to move to Attribute Based Access Management to ensure only the required attributes are visible to a person for business purposes.  

  • Data lineage – Data Lineage will ensure the organization is always aware of where the PII has moved based on business needs. A real-time view utilizing a data lineage tool will need to be implemented and kept current for all PII in the organization.

  • Metadata management – Metadata is challenging due to several reasons. In landscapes that rely on vendor tools, vendors never provide full metadata inside their products. In organizations with lots of legacy applications, metadata was likely never captured in the granularity required. This is an investment every company must make, to understand the data inside the company, especially personal data, and how it’s being used and transformed. This capability also plays a critical role in discovering data easily for reporting.

  • Roles and responsibilities – Domain owners who have PII attributes must understand their responsibilities according to the regulations and internal policies. Data Owners sometimes focus on the applications they own and don’t take responsibility for the data once it leaves their system – they need to now own it end to end. There must be someone accountable for the full lifecycle of PII data.

  • Data quality – The wrong address or email can result in the sharing of very confidential information in certain use cases. Therefore, companies have to take extra measures to ensure the personal data collected is of high quality.

About the author:

Nazia Shahrin is a Data and Transformation executive with a proven track record across capital markets, personal banking, commercial banking, and wealth management. She is also an Adjunct Professor at the University of Toronto, teaching Data Governance at a graduate level for the Faculty of Information.


Shahrin has led architecture, data governance, and data privacy initiatives with an innovation and engineering mindset - believing in building robust data ecosystems with strong management capabilities and privacy by design to protect the organization's most important digital asset.

Related Stories

No stories found.
CDO Magazine
www.cdomagazine.tech