Ask a chief data officer (CDO) what their role entails, and you’ll get a different answer from each person you talk to. However, the answers will likely revolve around the themes of data management, analytics, using tools, and data management systems. Unfortunately, we are still thinking about what the CDO should know — and the skills they should have — incorrectly.
The conversation regarding data is about realizing value from data as an asset. Firms and the greater industry ecosystem create data at a rapid pace, creating mass amounts of data that need to be wrangled, consumed, stored and curated. Starting with the big “Four V’s” of data (velocity, variety, volume and veracity) and adding a fifth V of “value”, data officers need solutions that can help manage these variables and help organize the chaos resulting from this firehose of data.
While data managers have managed to begin to create awareness that purchasing a database solution isn’t anywhere near an answer to data concerns, too many of the approaches and evolution of thinking are still embedded in technology. While there is a people management angle regarding creating governance and awareness through stewards and similar roles, most of the focus for solutions center on technology, tools and regulations.
Deep discussions about lakes versus swamps, dashboards, NOSQL, R and Python, and master data management — they all center on technology and the products and services that are used to organize structured data, or use NLP to bring order to unstructured data. There is a passing nod offered to semantics, acknowledgement that metadata is critical for data management, and the occasional mention of ontologies over taxonomies.
The skills required of a CDO typically speak to an enlightened technologist with good leadership and change management skills, and can cross the divide between technology and the business. Sometimes, it is someone from the business that has the technical chops to take databases, architecture, specifications and agile development practices that gets the honor. For a data-centric role, however, the requirements upfront are heavily in favor of technology, programming and related skills.
For all the talk of data as information, and that we need to “speak the language” of data, there seems to be a couple of glaring skills missing from the required list of skills — specifically, language and linguistics. As if the CDO doesn’t already have enough on their plate.
These skills, however, will be critical in order to move the CDO beyond simply being an expensive project manager applying technology and change management to data problems. Data professionals are coming to terms with the intransient nature of data — how it transforms and changes — and how context affects analytics and interpretation. Data practitioners talk of domains, assuming that we all understand what we mean by “domains”. What qualifies as a domain, however, is mostly self-defined or defined by industry, or, to be a bit flippant, we define domains depending on the domain we are in. In reality, domains are already recognized in linguistics under concepts such as communities of practice or speech communities.
Data professionals are trying to reinvent the wheel on this and many other issues. Linguistics as a practice and science has existed for 3,000 years. The modern concept of applied linguistics has existed for over a century. Why, then, would we try to resolve or create new methodologies from scratch? Some may argue that data professionals have been using linguistics through NLP and integration of ontologies and semantics all along. And, while that might be partially correct, the theory behind language when applying NLP or semantics, is typically not well-understood or considered. These are viewed as solutions in their own right that can wrangle the challenges of understanding data on their own. Data science, while critical, is not a substitute for applied linguistics. Unfortunately, our penchant for technology and our analogies used to describe and simplify how we talk about data (data as water, oil, and so on) oversimplify and ignore some basic fundamentals that need to be integrated with data management and analytics solutions.
Simply put, data is language. And like language, it evolves, but it does not evolve in a linear or uniform manner, especially across domains and speech communities. Further, data isn’t a singular concept, just as we have thousands of languages. We should remember that we are unable to express things in one language that we can express in another language. What may be a shared data concept today may diverge between two communities over a short period of time, yet our technical architectures and infrastructure solutions are not built to recognize this, much less respond to it. Technology and standards function to create static, relatively consistent objects, while language resists this by continuously evolving.
To use the water analogy: the typical dimensions are lineage (where water comes from, what it passes through, how it changes through each “gate”), quality (introduction of contaminants or cleaners), and form or function (water, ice, vapor). But we cannot expect everyone to refer to every variation water can exist in as H2O. We cannot standardize away the complexity of form, function and context. Further, what is completely missing from our data journey is that the environment through which water travels, changes over time, thus affecting the water and its path. And, to complicate things further, the water itself, travelling through the system, affects and changes the landscape around it at the same time.
Still, even given the flexibility of this analogy, there are missing dimensions. One thing is clear — we cling to the illusion that we can control data, just as we believe we can control water. We are mistaken on both counts. Water is a force of nature; it breaches levees, carves new paths through earth and rock, and suddenly and unpredictably appears and disappears in floods and droughts. Data, like language, is very much like water in this way. We cannot control its evolution and change, the groups or subgroups of speech communities that become isolated or split from the main and create their own dialects or new language derivative, or force everyone to experience data the same. Data evolves both over time and in sudden, unexpected ways — a definite challenge for attempts at standardization and technical solutions.
Worse yet, when inside any speech community, we are unlikely to perceive when data and language are evolving away from our core center, or when we may be diverging away from a larger community we once completely aligned with.
In the end, this is not a technical or business challenge. It is a challenge in bringing in and incorporating the expertise and learnings from hundreds to thousands of years of applied linguistics learning, methodologies and perspectives.
There is artistry in data, and value will be better realized by those that understand and leverage the transient nature of data than by those who merely try to control and domesticate it. In the end, it is the chief data officer who is a jack of all trades, a polymath, a 21stcentury Leonardo da Vinci, and becomes their firm’s chief renaissance person.
A senior executive with 30 years’ experience in the financial industry across operations and technology functions, Rich Robinson has worked throughout the front, mid and back offices at major global custodian banks, brokerages and industry utilities, leading transformative projects in data, operations workflow, and messaging. For over 20 years, he has been heavily involved in the industry as an active participant of key working groups related to international data and messaging standards, including ISITC, FISD, EDM Council, ISO, ANSI/X9, ISDA, and SIFMA. He co-founded the first group on Unique Instrument Identification in 2000 and was a primary participant in the best practice work for ISO15022. He was convenor for the ISO Study Group on CFIs and UPI, and led the ISO Working Group on the Unique Transaction Identifier standard. Robinson is on the board of directors of ISITC, serving as 2nd vice chair, and is lead sherpa for the Asia Pacific Finance Forum’s Financial Market Infrastructure Workstream on behalf of the Asian Business Advisory Council to the region’s finance ministers.
A regular speaker at conferences, Robinson has been published in the Journal of Securities Operations and Custody, Waters, and Inside Reference Data, among other global financial services publications. His 2018 paper, “A Linguistics Approach to Solving Financial Services Standardization,” introduces the innovative idea to use applied linguistics to guide standards development and regulatory decisions and was published in the Journal of Financial Markets Infrastructure.
Robinson is currently head of Strategy for Standards and Open Data at Bloomberg LLP. He works globally with regulators, legislators and industry leaders on addressing data and standards issues to create more efficient and transparent markets. He holds a Master of Business Administration in organizational behavior and information technology from NYU’s Stern School, and a Bachelor of Science in industrial management, with a concentration in management information systems, from Carnegie Mellon University.