AI Made Simple -What Every Conversation Designer Should Know (Series)

AI Made Simple -What Every Conversation Designer Should Know (Series) — RAG Basics

As a conversation designer, it’s important to understand some of the techniques used to optimize large language models (LLMs). While you may not be implementing these methods all by yourself, having a working knowledge of them can definitely help you collaborate more effectively with engineers, researchers, or speech scientists — and ultimately make you a better designer.

Let’s start with RAG.

So, what exactly is RAG?

RAG stands for Retrieval-Augmented Generation. It’s a technique used to make LLMs smarter and more accurate by combining their natural language abilities with your own external data sources or knowledge base.

Let’s imagine a simple scenario:
You’re designing an ai-powered chatbot for a company with frequently updated FAQs or documentation. Even if you’re using a powerful model like ChatGPT or Claude, there’s a challenge you’d encounter— these models don’t know about your updates. Why? because these large language models are trained on static datasets and re-training them every time your content changes can be impractical and expensive.

This is where RAG comes in.

Instead of relying solely on what the model “knows,” RAG retrieves relevant documents from your knowledge base (like an FAQ or internal wiki) in real time. These documents are then passed into the prompt as context — so the model can generate a response that’s both accurate and grounded in your latest data.

In simple terms:
Retrieve relevant info from your data
Augment the language model with that info
Generate the final response

Let’s visualize the process.

RAG Generation workflow (Source: bentoml images)

What’s Happening Behind the Scenes?
Before your data can be used by the language model to generate a response, it first needs to be processed and transformed. Here’s how that typically works:

Chunking the Content
Whether you’re using FAQs, documents, website content, or even images, the first step is to break them into smaller chunks. For text, this could be paragraphs or sections. For images or audio, these may be frames or short clips.

Creating Embeddings
Now each chunk is then converted into an embedding — a list of numbers that represents the meaning or features of that content. For text, these embeddings capture the semantic meaning. For images, models like OpenAI’s CLIP convert visual content into vector form. For audio, embeddings represent sounds or spoken words.

Storing in a Vector Database
These numerical representations — also known as vectors — are stored in a vector database, which is optimized to search for similar content. When a user asks a question, the system retrieves the most relevant chunks based on semantic similarity — not just keywords.

Retrieval and Augmentation
The most relevant chunks are retrieved from the vector database and added to the language model’s prompt. This gives the model grounded context to generate a response that’s accurate, up-to-date, and specific to your data.

Below is an example illustrating what happens when a user initiates a search in a RAG‐enabled application.

The user’s question is embedded and searched against a vector database, relevant document chunks are retrieved to construct an augmented prompt, and then sent to the LLM to produce the final answer.

User Query
↓
Embed Query
↓
Search Vector DB
↓
Retrieve Relevant Chunks
↓
Construct Augmented Prompt (using query + retrieved chunks)
↓
Pass/Send to Language Model
↓
Generated Answer

As conversation designers, understanding RAG (Retrieval Augmented Generation) unlocks new creative possibilities in how we structure content and support our assistants.

AI Made Simple -What Every Conversation Designer Should Know (Series) — RAG Basics was originally published in UX Planet on Medium, where people are continuing the conversation by highlighting and responding to this story.