Joe DosSantos

Joe DosSantos

The pressing need for real-time data pushed businesses to accelerate and expand their investments in data and data management solutions during 2020. While much of the industry focus is rightfully on new advancements in predictive analytics that are driving business value, it is becoming increasingly clear that architecture and data teams are working equally hard to enable this data with more modern data architecture and new data sets. Recent research from IDC, commissioned by Qlik, revealed that in Q1 2020 alone, 30% of organizations were conducting major architectural changes and 45% were adding new data to their analytics environment, significantly expanding the capabilities in their analytics arsenals. Multiple cloud platforms, on-prem sources and even offline documentation, such as direct mail surveys, are all feeding inputs into the modern data pipeline.

All this new data isn’t translating into value or impact all of the time. Data engineers and data analysts are struggling to unite all of these data sources in a strong insight-driven pipeline. Many spend up to 80% of their time simply trying to identify the data and assess it, leaving little time to turn data into insights that drive the business. Creating valuable analytics will require us to flip that ratio on its head. Organizations need to be able to mine information at the speed that the competitive marketplace requires to succeed. Effectively converting data into active intelligence – a state of continuous intelligence from real-time, up-to-date information designed to trigger immediate actions – requires businesses to reconsider how they onboard their data in a way that simultaneously sets it free for analytics use and protects the enterprise from risk. Here are tested strategies on how to evolve data onboarding to fuel real-time analytics and decision-making.

1. Align Your Data Priorities to Your Business Priorities

Every day there are critical business questions to be answered – how many employees do we have and where are the employees? What is our forecasted profit this quarter? Where are our least productive business units? But not all of these questions are top of mind for the C-suite. And a good place to start an analytics journey is there, creating analytics that drive the most important strategic objectives of the business. Data onboarding should align equally well to these objectives. In short, strategic objectives need analytics and analytics need data. 

Collaboration among this stakeholder group, specifically when it comes to data onboarding, creates the crucial organizational alignment needed to deliver high-quality data at the speed of innovation. By exploring data onboarding strategies, teams at different levels and with different skills are able to build a 360 picture of the data, how it’s captured, stored interrogated and leveraged. That understanding will shape smarter and more efficient strategies that can have a strong effect on reducing processing costs through reduced time in developing and capturing insights.

2. Focus on Reusability

One aspect of the core DataOps frameworks you develop will be aligning data storage strategies with usage and value of data. In the first generation of Business Intelligence, complex transformations loaded data to data warehouses, which created precise but brittle data transformations. This was good for answering the same question over and over but not great for answering slightly similar ones. The pendulum then swung dramatically to data lakes, which made raw data available to everyone, but left it up to the developer to make sense of what the data meant and how to use it. At the same time, traditional infrastructure that supports core production systems are dependent on batch uploads and extended cycles, and struggles with managing the data volumes, speed and readiness required for real-time data usage.

Modern pipelines find a happy medium, building reusable data assets that are well understood and are highly reusable by the organization. Technologies like data catalogs can help create clarity to the inventory of these assets and extend the value of current data storage investments.

By using automated, real-time data pipelines, organizations can move toward greater efficiency by collating and preparing the data that they know they need and data they might need based on business use cases into a data catalog. Levering a data catalog as a singular repository scales the ability for more analysis across the organization while still keeping data governance and role-based policies in place, helping to ensure the right data can be quickly ingested and transformed from different sources for more users.

3. Include data owners in data onboarding processes

The need to accelerate the data-to-insights pipeline has never been more pressing, but it is not solely a technical problem. It requires clear business processes to ensure that the team can keep up with the speed of the new flow of relevant data. Understanding what data is most relevant, based on roles and needs, means involving the data owners in the process. The business owner can communicate the meaning of data in the data set and outline its respective sensitivity or privacy. Both are critical to effective and safe use in analytics.

Data catalogs add an element of technical support to this process by enabling anyone – from data engineers to business users – to learn, act and react to the data that’s relevant to them by seeing what sources are approved and most used by colleagues. As users explore and request new data, that cycle adds to the overall tribal knowledge of what data is really driving impact and value.

And when marketing leads execute augmented analytics on trusted data sources, they will be likely to explore new questions, which leads to the need for more and new data. The catalog sees that need and serves up new and relevant data sources for exploration. The data sets in the catalog now reflect a deeper understanding of additional data that’s bringing value to the process, which in turn will create value for the next user and next campaign. Data catalogs deployed this way, as part of a searchable SaaS platform rather than a static data set, supports both management of governance and access privileges and real-time data-driven decision-making.

4. Identify, control and record each data journey

When business and technology leaders don’t understand the nature of the data they own, they’re not able to make appropriate security and privacy considerations. In many instances, this leads to data lockdown policies that create burdensome data silos that limit the availability and value of key data.

As data onboarding is strengthened by cataloging and data ownership, leaders can establish frameworks of security, governance and metadata capabilities that guard user access privileges and track user activity while still making more data available.

A strong data onboarding strategy has built-in policies to mask and secure personal and sensitive information. It should also include a set of safeguards to determine what happens next, such as when an unexpected data type is encountered, or if a data field is populated incorrectly.

Flexibility is important when looking to accommodate new data sources, which may come from unfamiliar platforms or in new formats. Overly restrictive policy barriers need to be reviewed to allow those kinds of data sources to be included in the data catalog when relevant – or risk choking off a potential avenue for meaningful insights.

Drive the business forward with the right data strategy

The flow and complexity of data will grow over time, and how successfully a business can harness that information and quickly convert it into insights will separate the leaders in the pack. With strong data onboarding in place, organizations are better positioned to effectively embrace a culture of data-driven decision-making that includes non-technical users sharing the responsibility beyond data engineers and data owners. As part of an overall DataOps strategy, data onboarding can empower organizations to efficiently manage their complex and growing data pipelines, harnessing the volumes of data they’ll need to make more intelligent decisions in the future.


Joe DosSantos

As Chief Data Officer, Joe DosSantos is responsible for the alignment of business and technology to enable 3rd Generation Business Intelligence at Qlik. He is responsible for use case prioritization, DataOps methodology, and the deployment of information management systems, including all of Qlik’s Data Integration and Data Analytics products. He also provides thought leadership in modern Data Architecture and Data Governance to other CDOs. 

Prior to taking on the CDO role, Joe was responsible for Qlik’s Data Catalyst product positioning and competitive intelligence, developing business-focused offerings with Industry Solutions, and defining QDC’s long-term product roadmap.  

Before joining Qlik, Joe was the Vice President of Enterprise Information Management Technology Services at TD Bank Group.  In this capacity, he was responsible for enterprise technology required for the management, transformation, and analysis of information across the Bank.  He led the delivery of an enterprise data lake that included a metadata driven catalog and data as a service experience, Hadoop native ETL, and next generation reporting, analytics and artificial intelligence solutions.  He is also responsible for the Master Data Management, Data Governance, and Data Warehousing tooling.