Opinion & Analysis

To RAG or not to RAG? 10 Serious Considerations Before You Begin to RAG

Trust the data often, but always validate.

Written by: Patrick Chew

Updated 3:18 PM UTC, Mon July 1, 2024

Today, you are either bombarded with AI startup news or breaking frontier information about discoveries of AI or GenAI methodologies, partnerships, offerings, or capabilities. Our interest in AI and GenAI keeps growing, but are we ready to handle the constant issues and unreliable outcomes caused by our data?

Given that AI is here to stay as compared to its predecessors RPA or Blockchain where adoption is still tepid, we might as well prepare ourselves to capitalize and leverage AI and its various sub-categories to the best of our resources.

A primer on AI, GenAI

AI is nothing more than our ingenious technological attempts to have machines imitate or respond to our natural languages based on the instructions it is supposed to execute. However, we all know that all AI models’ responses to our natural language are notoriously flawed in many senses due to the lack of contextual understanding of their full intent.

When AI first came out, I used to jokingly say, “All models are good but only some are useful.” This essentially enunciates the fact that if your underlying data feeding AI is lukewarm, you get a lukewarm outcome, and the outcome can be meaningful or improved if your data quality increases. There is a direct correlation between data quality levels and AI results.

As we know LLMs (Large Language Models) tend to be more static or less “reactive or updated frequently.” One approach to counteract that is called RAG (Retrieval-Augmented Generation) founded by Douwe Kiela as CEO and Founder of Contextual AI.

RAG is essentially enabling us to fine-tune by making the latest information available for generating reliable outputs and managing hallucinations.

Also, if you haven’t heard about it yet, CLMs (Contextual Language Models) are now available in RAG 2.0 where they answer open-domain questions in a much better way!

So, what is RAG?

As we continue to inundate our LLMs, we often run into made-up values or incorrect facts leading to hallucinations. These hallucinations can be caused by a slew of factors including insufficient training data sets, incorrect assumptions generated from the model, or biases applied to the data set used to train the model.

One way of reducing (not eliminating) or mitigating knowledge gaps leading to unnecessary hallucinations is by applying RAG or Retrieval Augmented Generation. RAG can be very useful if you apply specific proprietary or domain-specific knowledge without fear or concern that your LLM is too broad and unsubstantiated for what you need.

Meanwhile, there have been some dissenting arguments in the AI research world about if it is the same as fine-tuning data sets or scenarios. At a high level, they seemed similar, but researchers suggested that “RAG-ing” is more apt to integrate new knowledge while fine-tuning is used more towards model improvements by incorporating internal knowledge and complex teaching and formatting its final output.

The best-in-class approach is to apply these complementary approaches together with solid prompt engineering, enabling you to improve and arrive at an efficient optimized model.

So, is it fine-tuning then?

Although the ultimate intent of both fine-tuning and RAG-ing are the same – driving and achieving business value or outcome – there is some difference between the two.

Fine-tuning narrows down to training your LLM on a smaller, customized, and specific dataset and constantly adjusting the model’s algorithms and parameters embedded with new dataset attributes. This encompasses specific datasets harvested from defined domains with all its nuances, terminologies, and logic with the hope of making the model perform better for very specific purposes.

There is no cookie-cutter fine-tuning technique as each one is different for specific solutions. The data set tends to be very narrow and specific for these domains and unlike RAG-ing, it operates in black-box situations. This is because the model internalizes each new data set and it is sometimes difficult to truly understand the reasoning behind these new data sets responses.

On the contrary, RAG-ing is more apt to deliver reliable and accurate responses since it is constantly drawing from its latest most updated datasets. Errors or potential hallucinations are easily tracked and traced. RAG-ing is also significantly less costly since you can contain or limit its size with pre-defined narrowed datasets. It also allows tighter secured access or exposure given that datasets are confined within your environment and can be monitored extensively ensuring no unauthorized access.

Types of RAG

Naïve RAG: It essentially follows the traditional process of indexing, retrieval, and generation via a user’s input through querying relevant documents that are combined with a prompt for the model to generate its final response.
Advanced RAG: This method incorporates existing issues within the Naïve model by applying a pre-retrieval process to optimize the data indexing thereby increasing the quality of the data via 5 stages: Enhancing data granularity, Optimizing index structures, Adding metadata, Aligning optimization, and Mixed retrieval.
Modular RAG: To gain flexibility and generate diversity you can simply add, replace, or rearrange specific modules to fix or solve different contextual problems.

Who should you be then?

As McKinsey so clearly defined, there are 3 archetypes of AI/GenAI:

Taker: Business purely consuming services through basic interfaces such as APIs. With limited talented resources and funds, most of us would fall into this category.
Shaper: Capitalize this by accessing models and fine-tuning them further with its own data, such as applying RAG on its model with its proprietary datasets.
Maker: The top-of-the-line where the business builds its own foundational models, probably the most expensive and resource-intensive out of the three from an investment perspective.

Some recent Maker pitfalls where Bloomberg spent $10M to build their RAG model only to discover that the next generation of ChatGPT 4.5 got their RAG model available for free. Do your research intensively to ensure what you are trying to RAG isn’t already there, just that you haven’t discovered its existence and application.

When should you begin to RAG?

Now that you understand what RAG is about, here are 10 serious considerations to ponder upon before you jump into the deep end:

Before even thinking about “RAG-ing, is your organization ready for AI or GenAI consumption? This should also consider your current skill sets ranging from Prompt Engineers to AI-aware talent pools, to C-Suite or Senior Management AI literacy and education. How far is that gap and how long will it take to bring that learning curve up both from the development and consuming ends?
Start asking “Why” rather than “How.” The “why” tends to be more focused on your business needs rather than embarking on AI/GenAI regardless of what it does or can provide. Without a strong business-driven purpose, the latter may take flight quickly but will crash and burn as soon.
What value or impact do I envision AI or GenAI will bring forth? Business outcomes should be strongly articulated upfront so that the journey can self-sustain itself so long it continues to deliver value or business outcomes.
How would I RAG it responsibly thereby ensuring my content is used responsibly covering these following areas of responsibilities?
1. Fair and equitable: Ensuring that the model is not generating algorithmic bias due to unintended or imperfect training data or decisions applied.
2. Explainable: Be able to present and explain the model’s content End To End, removing the “black box” syndrome.
3. Secured: Ensuring that your model, contents, and access are well monitored and managed even to a point where prompt injection (an unauthorized third party provides new instructions to trick the model into delivering output unintended for the end user) is constantly monitored and prevented.
4. Reliability: is your model concise, consistent, and reliable?
5. Protection as IP (Intellectual Property): Ensuring this as part of training the model via any of the RAG methods. Sometimes it can create significant IP risks. Ensure the inputs into the training model or its output do not infringe on any existing copyrighted, trademarked, or patent-protected materials.
Is my current foundational data primed for use within any AI or GenAI context or even RAG-able to produce what it is expected to support while reducing hallucinations?
How mature are your existing foundational data sets? Do you have all your data domains still owned by IT or have your business partners stepped up?
How mature is your talent pool? Do you currently have the required skillsets ranging from Prompt Engineering to AI Governance established?
Do you have an unlimited RAG budget or revisit which role you would rather play – Shaper, Taker, or Maker?
At what phase will your RAG deliver? Is it the Value Enablement, Creation, or Realization phase, or is it purely to manage Risk Exposures, Management, or Compliance?

What is your relationship between Data Governance/Enablement and AI knowing you fundamentally can’t have AI without data?

Takeaways

Remember, it’s not the “how” but focus on the “why” that will sustain you.

Always aim for answering the “why” as the “how” can be easily accomplished by getting the right tool, right methodology, and right resources. The “why” should be translated to ensuring value as your guide to your eventual outcome. Data-To-Outcome must always be on the radar. Sometimes we lose track of our intended objective of harvesting outcome or value. The value you’re aiming for should speak strongly of what the data is required to deliver.
Start with Taker to get a better understanding of the landscape, your capabilities, and your appetite for great consumption on this path to maturity. Data Governance/Enablement is not built overnight. So, your GenAI initiative supplemented with RAG, while more dynamic, should also mimic some of the Data Governance journey – not quite a marathon nor a sprint but somewhere in between given that the GenAI landscape is constantly evolving.
Ensure reliable data is available. The key differentiator for valuable AI/GenAI is having reliable data as we all know the value of our outcome is always rooted in its data quality. Remember, the 4 stages of the AI and Data Governance relationship.

Stage 1 is where you are in a harmless state where you have no AI when you have no data.

Stage 2 is where you are in a hallucinations state when you have inconsistent data.

Stage 3 is when you are in a reckless state when you have unreliable data.

Stage 4 is where you become dangerous or consequential when you have bad data. Hopefully by applying the appropriate RAG, your stages of risks or hallucinations should reduce favorably to get what you intended.

So, take time to decide if you should RAG or not RAG as those decisions will always fall back to your Data-To-Outcome value creation and realization attainment. In the spirit of practicing responsible AI, it is my responsibility to remind you to factor in the human portion of validating its results regardless of how strong, solid, or consistent your RAG method is.

May the best RAG win.

References:

Patrick Lewis, Ethan Perez – Retrieval-Augmented Generation for Knowledge-Intensive NLP tasks Cornell University – Retrieval-Augmented Generated for LLM: A Survey https://arxiv.org/abs/2312.10997

McKinsey – What every CEO should know about GenerativeAI

https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/a-generative-ai-reset-rewiring-to-turn-potential-into-value-in-2024?stcr=2954EC2DFA944BB192BF293C20E987EB&cid=other-eml-alt-mip-

About the Author:

Patrick Chew is Global Head of Data Enablement at AIT Worldwide Logistics. He currently practices data science and intelligence specifically in the data management, governance, and blockchain arenas supporting organizations to be data-driven by applying good data to fuel innovation in all realms of AI, machine learning, data strategy, business intelligence, analytics, big data, and blockchain initiatives. Over the past 25 years, he has been assisting many Fortune 500 organizations make appropriate and/or intelligent decisions based on facts instead of intuition, substantiating operational, tactical, and strategic decisions.