A deep dive into Embedding and Retrieval-Augmented Generation (RAG)

The support team's struggle

Imagine you're leading a support team at a rapidly growing tech company. Your inbox is flooded with customer inquiries, and your team is drowning in a sea of documentation. They spend countless hours searching for the right information to address client questions, often coming up short or sending outdated responses.

Sarah, one of your best support agents, sighs as she reads the latest ticket:

"How do I integrate the new API with my existing database? I can't find any clear instructions in the docs."

Sarah knows this information exists somewhere in your vast knowledge base, but finding it feels like searching for a needle in a haystack. She spends 30 minutes digging through various documentation pages before cobbling together a response. Meanwhile, the backlog of tickets grows, and customer satisfaction scores are taking a hit.

You realize there must be a better way. What if you could build a system that not only helps your team search more effectively but even suggests responses before they see the support cases come in?

Enter Retrieval-Augmented Generation (RAG) – a game-changing approach that could revolutionize how your support team operates.

What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that enhances AI language models by giving them access to up-to-date, external information. Think of it as giving your AI assistant a smart, always-current reference book.

Here's how it works:

  1. When a question comes in, RAG searches through a database of current information.
  2. It retrieves the most relevant pieces of information for that question.
  3. The AI then combines this information with its existing knowledge to generate an accurate, helpful answer.

For Sarah and your support team, this could mean:

  • Instant access to the most relevant documentation
  • AI-suggested responses based on the latest information
  • Dramatic reduction in time spent searching for answers
  • More consistent and accurate responses to customers

Let's dive deeper into how RAG could transform your support operations and why it's becoming a crucial tool in AI-assisted customer service.

Why use RAG?

One of the biggest challenges with traditional AI models is their tendency to "hallucinate" or confidently provide incorrect information, especially about topics they weren't trained on or that have changed since their training. RAG significantly reduces this problem by providing the AI with relevant, up-to-date information.

Let's see how RAG addresses key information retrieval challenges:

  1. Handling Niche Information: A user asks about configuring a rarely-used feature in your software.
    • A standard AI model might have limited or outdated information about this feature.
    • RAG can access the most current documentation, even for rarely-used features, ensuring accurate and detailed responses.
  2. Adapting to New Information: Your company just released a major software update with new features.
    • A traditional AI, trained months ago, would be unaware of these changes.
    • RAG can immediately incorporate the latest release notes and updated documentation into its responses, ensuring customers receive current information.
  3. Semantic Understanding for Document Retrieval: A user asks, "How do I secure my data in transit?"
    • A keyword-based search might miss relevant documents that don't contain these exact terms.
    • RAG's semantic search understands the intent and can find documents about encryption, SSL/TLS protocols, or VPN setup, even if they don't use the phrase "secure data in transit".

By leveraging RAG, support teams can provide more accurate and comprehensive information, especially for niche topics, recent updates, and queries that require understanding context beyond simple keyword matching. This improves the efficiency of information retrieval and enhances the overall quality of support.

How does RAG work?

1. Prepare your data

Gather and preprocess all the current information you want your AI to reference. For our support team, this includes product manuals, API documentation, and other relevant support documents. This step involves:

  • Cleaning and normalizing text
  • Splitting long documents into smaller, manageable chunks
  • Handling various file formats (PDFs, HTML, etc.)
  • Extracting and managing metadata

Example:
We have a support article titled "How to Upload Files into Snowflake," which has been cleaned, split into paragraphs, and had its metadata (like creation date and author) extracted.

2. Turn text into embeddings

Pass the preprocessed text through an embedding model to convert it into lists of numbers called embeddings. These embeddings represent the meaning of the text in a way computers can understand and compare.

Example:
Article chunk: "To upload files into Snowflake, first ensure you have the necessary permissions..."
Embedding: [0.114, -0.301, 0.511, ..., -0.232, 0.090]

Note: Embeddings typically have hundreds or thousands of dimensions. The choice of the embedding model (e.g., BERT, Sentence-BERT, or OpenAI's models) impacts performance, with trade-offs between accuracy, speed, and cost.

3. Store the embeddings

Store the embeddings in a vector database optimized for quick similarity searching. Popular options include Pinecone, Weaviate, and Postgres Vector. These databases handle the high-dimensional nature of embeddings efficiently.

4. Process incoming questions

When a question comes in, preprocess and convert it into an embedding using the same process and model used for the documents.

Example:
A user asks two different questions:

  • Query 1: "How can I bring external data into a Snowflake table?"
    Embedding: [0.112, -0.305, 0.512, ..., -0.234, 0.091]
  • Query 2: "What’s the process to upload a file into a Snowflake database?"
    Embedding: [0.119, -0.290, 0.510, ..., -0.230, 0.049]
5. Semantic search

Compare the question’s embedding to the stored embeddings using a similarity function like cosine similarity to find the closest matches. 

Example:
Comparing the query embeddings to the article embeddings:

  • Similarity(Query 1, Article chunk) = 0.85
  • Similarity(Query 2, Article chunk) = 0.91

Even though Query 1 uses different wording, the high similarity shows both queries are closely related to the article. This is where RAG often outperforms keyword-based search.

6. Retrieve relevant information

Based on the similarity search, retrieve the most relevant chunks of information from the database. Instead of whole documents, the system typically pulls specific passages or chunks relevant to the query. In more advanced systems, this can also include metadata such as links to where the information is located that the model is referencing.

Example:
The system retrieves relevant paragraphs from the “How to Upload Files into Snowflake” article for both queries, even though Query 1 doesn’t share keywords with the article title.

7. Generate the answer

Feed the AI language model both the original question and the retrieved information. The AI uses this context to create a well-informed answer.

Example:

  • For Query 1: "To bring external data into a Snowflake table, you can use the file upload process. First, ensure you have the necessary permissions. Then, follow these steps: 1)... Here are some relevant documentation links that may help: https://…”
  • For Query 2: "The process to upload a file into a Snowflake database involves several steps. Let's walk through them: 1)..Here are some relevant documentation links that may help: https://…."

By leveraging RAG, Sarah and the support team can now provide more accurate and timely responses to customer inquiries, significantly improving their efficiency and the overall quality of support. The system handles a wide range of question phrasings and accesses the most up-to-date information.

Read part two here!

Note: Be sure to check with your team before sharing any internal documents with OpenAI or other LLM providers. Even if Matillion is approved by your security team, you're still passing data to a third party, so it's best to discuss this with them before proceeding if you're handling internal data. In our example, we're using publicly accessible documentation from our docs site.

Riley Phillips
Riley Phillips

Professional Services

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.