Digital Transformation
Written by: CDO Magazine
Updated 6:22 PM UTC, September 20, 2023

(US and Canada) Alex Golbin, Chief Data Officer at Morningstar, speaks with David Mariani, CTO at AtScale, about delivering quality data, multiple dimensions of data quality, operationalizing data quality, and data’s future.
Golbin, at the outset, states that given the vastness of data used in organizations, understanding the basic parameters is not sufficient. He adds that if an organization says that the data is 99.99% right, there is still a 0.01% error that equals 100,000 errors in 10 billion data points.
He further states that while it is impossible to eradicate all errors due to limitations, it is possible to eliminate the mistakes that matter. Hence, it is essential to understand what is important and what is not. He cites a portfolio example: If a company allocates 10% for security — say, a treasury bond — then the company needs to be sure that the bond price is correct.
Golbin then recalls a conversation with a client in Japan concerning data quality that led to building a data quality rubric. He affirms that there are multiple dimensions of quality. Sometimes it is based on accuracy/reflecting the source truth, internal consistency, external consistency, timeliness, and relevance.
Next, he offers the analogy of reporting a baseball player’s statistics. A column showing all zeros for touchdowns is technically correct but irrelevant since there are no touchdowns in baseball, making the data faulty.
Golbin mentions operationalizing data quality as the quality and transformation team responsible for the inside-out aspect of tracking and benchmarking. The outside-in aspect involves conversations with key clients to gather insights about how they feel about the data. Then both the inside-out and outside-in perspectives are harmonized to understand the underlying data issue and resolve it.
He acknowledges technological advancements in the ML, AI, and deep learning space. However, there remains a gap between the skill sets of data scientists, data analysts, and other regular employees. Golbin maintains that data scientists need to simplify their tools for everyday people to use data. On the other hand, the on-ground people need to upskill and meet the data scientist mid-way.
In conclusion, while wishing for a future when technology will be democratized, data analysts and operators will speak the model language and write data using ML and other tools.