My challenge was similar to every other chief data officer’s challenge when it comes to creating a data strategy. In fact, the whole reason why companies spend a lot of time and money on data warehouse solutions originate from what I will discuss.
My first assessment of the company’s data environment would be characterized as “The Wild Wild West”. Business users had their own copies of replicated databases and their own derived data, and were running their own SQL queries, putting results in Excel and emailing that to the users within the company. There were nightly data replication jobs to support analytics that enabled the business to create siloed data which resulted in multiple versions of the truth.
In addition, analysts were spending most of their time on the mechanics of the data (running queries, formatting data, distributing data, and so on) and far less time analyzing it. Executive management informed me that they didn’t trust the data. What a surprise!
First and foremost, I did not want to create a corporate data warehouse solution where you can spend months loading data, transforming it and then creating datasets that may not be used by the end users. I wanted to focus efforts on outcome-based solutions where I simply ask the business, “What data do you need so you can spend your day analyzing data instead of creating/finding it?” This simple question would allow me to focus on what data is important and to deliver it in a timely manner.
To start this process, I could have done one of two approaches
1. Purchase an ETL tool and hire 5 to 10 developers where all they would do is create/support source to target mappings for nightly load processes.
2. Research the best technologies in the market that would allow for data replication without incurring technical debt.
Approach 1 never appealed to me as I have worked in this environment in the past. Ingesting data from different sources would ultimately become a bottleneck. Every new data ask would need to be developed, tested and supported by developers on the team. I work for a financial company, not an IT company, so for me to ask leadership for developers just to load data would not be ideal.
I ultimately chose approach 2 by using Fivetran (a cloud SaaS provider). Fivetran handles highly optimized data loads from over 100 connectors (Oracle, MongoDB, S3) to a cloud data platform. Setup takes minutes and the connectors adjust to schema changes automatically, so there is no re-engineering of code. Utilizing Fivetran would allow me to focus on what the business needs instead of creating technical debt with ETL mappings that may need constant support.
CLOUD DATA PLATFORM
The CIO gave me a task of choosing a cloud data platform, so I quickly researched Amazon Redshift, Google BigQuery and Snowflake. After research and some proof of concepts, I chose Snowflake for many reasons, such as:
• Snowflake is a cloud-based data warehouse that lets my team focus on outcomes instead of infrastructure.
• It provides a flexible billing model which means the company pays only for compute resources it consumes.
• High query performance requires no performance tuning or testing.
• Storing and querying JSON data provides access to data previously inaccessible.
• A single source of truth provides data that everyone can access and trust.
As mentioned, I did not create a data warehouse model where you create facts and dimensions that could (or could not) answer business needs. I focused on outcomes for what the business needed and conformed the data utilizing business views that satisfied requirements. I was able to leverage views due to the query performance of Snowflake, which allowed me to put the business logic in queries instead of ETL tools. I also created master data solutions for key data that the company uses daily and certified each solution. Having agreed upon master data and outcomes, the company now is using data that can be trusted throughout the company.
With Fivetran and Snowflake I no longer worry about software upgrades, data performance and storage space issues. I have delivered an enterprise reporting solution which utilizes secured information residing on a single platform that is fast performing, cost effective and enables end users the ability to trust and understand data to support their key business initiatives.
Today the company is more agile now that business data is centralized and available to everyone. The focus on analysis and outcomes helps teams make decisions faster. Analysts spend less time running database queries and formatting spreadsheets, and more time focused on business outcomes, deriving insights and driving action.
Bryan Christ has over 25 years of data experience and is currently the director of Data & Analytics at Axcess Financial. Prior to Axcess, Bryan worked at GE Aviation focusing on Big Data Platforms and Business Analytics, and began his consulting career focusing on Data Warehousing and Analytics.