Opinion & Analysis
Written by: Ricardo Rosales | SVP & Managing Director, Americas, Syniti
Updated 2:49 PM UTC, Wed January 22, 2025
AI and generative AI (GenAI) can bring enormous potential to the life sciences and pharma sectors – everything from the possibility of more efficient clinical trials to quicker regulatory approvals, improved R&D, and much more. We’re already seeing this technology used to transform processes like drug discovery and scientific information extraction.
An International Data Corporation (IDC) report reveals that just over half of mid-sized life sciences companies say their primary business goal is enabling digital transformation projects in addition to AI.
But moving from hype to reality is dependent on data — specifically, the availability of high-quality data. Without proper data management in place, generative AI can’t and won’t deliver results. That’s why organizations need to put data front and center with a “data first” approach.
The pharma and medical device industries have a lot to gain from GenAI – $60 billion to $110 billion in yearly economic value, according to an estimate from McKinsey. How?
One example is that GenAI can increase productivity by shortening the process of finding compounds for potential new medicines. Of course, new compounds unlock new side effects, but feeding the complicated connections between compounds, patient biology, side effects, and/or dosage into AI large language models can help find patterns that reduce the R&D lifecycle.
However, harnessing this potential relies on high-quality data. GenAI must have access to good data in order to produce good results. Imagine receiving doctor observations or patient feedback in different formats, columns, or fields. Dosage, administration forms, and frequency are all important factors.
Because of this, data modeling and governance are important to implement. Without these foundational elements, GenAI’s possible benefits and value will be diminished. What’s more, it could leave organizations slower to find the next-generation efficiencies they are seeking.
A major challenge facing organizations today is that data is still often seen as a technical exercise rather than a business exercise. How organizations view their data determines how ready they are to begin an AI project. The answer that shows their readiness is, “Yes, data is by all means a business exercise.”
It matters where the data resides, but it’s more about the context of why that data exists. The data is there to run the business, from R&D to commercialization.
Laying the right foundation requires the involvement of R&D or business owners in the mix from the get-go, not six months after you’ve already started an AI initiative. These owners might include scientists or doctors who are evaluating compound interaction which form the premise of active pharmaceutical ingredients.
Their time is limited; you can’t ask them at the last minute to hand over data or clean it up as they go. Data is the fuel for intelligence and should be treated with priority. Failing to cleanse data before embarking on an AI project can lead to poor model performance, biased decisions, time delays, increased costs, ineffective insights, or compliance risks.
Start by clearly defining the outcome you are trying to achieve because the expected outcome will reveal which data objects and/or fields are most important. Second, determine how the data impacts each other. This list of rules and intricacies will serve as the foundation of your “intellectual property.” These will build into a massive catalog that can be leveraged as part of your cleansing cycles to keep important data in perfect shape.
Start with the outcome in mind: As an example, to improve regulatory submissions, this will focus the organization on fields required, sources of information, formats, and ultimately how they are completed. The to-be state will dictate the direction where the organization wants to go.
Profile your data’s quality: Now the to-be state is known, data profiling tools will help identify gaps between the current and to-be state. Once gaps are known, ask “What rules can we develop to automate the fixing of the gaps?”
This automatically reduces time. Any gaps that can’t be fixed by rules will require manual intervention, which will need to be quantified in terms of the number of records vs. average time to fix vs. time available. This cross-section will inform how many resources are needed as part of the AI data team.
Ongoing data cleansing: Think of the old adage, “Give a man a fish, he eats for a day, teach a man to fish, and he eats for a lifetime.” The same edict applies to data quality. Organizations shouldn’t focus on clean data as the means to the end. The true prize is both the rules used to automate data cleansing and the fact that the organization now serves as a shepherd for continuous cleansing and new standards.
These rules represent each organization’s intellectual property of how they operate. As long as companies are introducing new compounds and new clinical studies, the data will always be generated in different formats. The organization makes sure learnings are taking place and a library of knowledge, aka rules, is formed. Your AI will thank you.
Data governance: Data governance is vitally important for enabling GenAI innovation because it ensures organizations are eliminating bias, following responsible data practices, and guaranteeing privacy. When GenAI and data governance work together, they can ensure privacy and compliance, even in the strict regulatory environment of life sciences and pharma.
The pharma and medical device industries are in the business of improving and saving lives, which makes the quality of their data vital. GenAI is making massive strides in its ability to uncover new drug therapies and that means the data it works with must be pristine – both clean and relevant.
Quality data lays the foundation for trustworthy, unbiased results. Delaying or ignoring data quality and governance measures, on the other hand, opens the door to significant risks including compromised efficacy, compliance, or ethics issues.
This is why a “data-first” mindset is so essential. As GenAI continues to reshape the life sciences sector, data quality and a sound data architecture will enable these organizations to fully capitalize on the possibilities that GenAI presents.
However, remember that GenAI isn’t a replacement for human employees. It’s still very necessary to have a human in the loop, but GenAI can significantly augment your efforts.
About the Author:
Ricardo Rosales is Senior Vice President & Managing Director, Americas at Syniti. Rosales has been focused on data for 18 years. During this time, he has led data conversions and governance for global implementations and built a reputation for turning around the hardest/no-win projects.
As Senior Vice President & Managing Director, Americas, Rosales is responsible for go-to-market responsibilities across the U.S., Canada, and LATAM which include sales, alliances, and making sure customers are set up for success. He sits on various customer steering committees and serves as an executive sponsor for many of Syniti’s key clients.