- Blog
- 10.30.2024
- Leveraging AI, Product, Data Fundamentals
How to use Mosaic AI Vector Search in Databricks

Welcome to the world of embedded vector databases! In this article, I'll show you how to manage and use Mosaic AI Vector Search, which is a vector database integrated into the Databricks Data Intelligence Platform.
A vector is just a long array of numbers… in this context, usually several hundred. A vector database is actually nothing more than a regular relational database with some special extra features for storing, retrieving, and searching for vectors.
Vector processing is useful in artificial intelligence because a sentence can easily be turned into a vector and added to a vector database. Afterward, when queried using a different phrase or sentence, the vector database can quickly retrieve the most similar records.
When a sentence is converted into a vector in this way, it's known as an "embedding" because the original meaning of the sentence has been "embedded" into the vector. AI models that convert sentences into vectors are known as embedding models.
Vector searches are far superior to simple word searching because embeddings contain a mathematical representation of a sentence rather than the actual words. So, for example, embedding the word "say" would be a close match to the word "tell." Even though the words are very different, the meaning is very similar.
The main practical application of vector database technology is in Retrieval Augmented Generation (RAG). The idea is that the vector database returns text records that are most semantically similar to the user's query. Crucially, this is very fast, and the results are vastly smaller than the full, original indexed text—small enough to easily fit as part of a prompt for a Large Language Model (LLM).
With this extra information, an out-of-the-box LLM can instantly become a domain expert in whatever subject matter you have stored in the vector database.
This article will conclude by demonstrating how to run a RAG query using Mosaic AI Vector Search in Databricks.
Creating a Vector Search Endpoint
All Databricks Vector Search Indexes are accessed through a REST API endpoint, so that's the first thing you'll need.
Vector Search Endpoints are a type of compute resource, so in your Databricks console, go to Compute > Vector Search and press Create.
The only thing to choose is a name and the type "STANDARD."
Note that one Vector Search Endpoint can handle many Vector Search Indexes.
How to create a Vector Search Index
Databricks offers three different ways to create and manage Vector Search Indexes:
- You supply a table containing the sentence text and choose an embedding model. Databricks creates the index and keeps it up to date after any change to the data in the source table.
- A similar method, but you handle the embedding yourself and supply a table containing both the sentence text and the embedding.
- Manipulate the index directly via the REST API
Option 1 is by far the simplest, and it's the one I'll demonstrate in this article.
When creating the source table, there are a few prerequisites to remember:
- There must be a primary key in one of the supported datatypes (INT is a good choice)
- The input text must exist in a TEXT column
- The source must be a Delta table with Change Data Feed enabled
To enable Change Data Feed on a Delta table, use the following SQL:
ALTER TABLE `your-catalog`.`your-schema`.`your-table` SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
If you have a large amount of data, loading the table first and enabling Change Data Feed afterward is more efficient.
Vector Search Indexes are a type of catalog resource. In fact, you will find them listed as schema objects alongside ordinary tables. In your Databricks console, go to Catalog, locate your source table, and then follow Create > Vector Search Index.
In this example, I'm using the embedding model databricks-gte-large-en provided by Databricks.
The operation will take a couple of minutes to finish while Databricks provisions the index and performs an initial sync against the source table.
Querying a Vector Search Index
This is straightforward in SQL when using the VECTOR_SEARCH function. You supply the name of the index, the text to search, and the number of results.
SELECT * FROM VECTOR_SEARCH( index => "your-catalog.default.name_of_new_vector_index", query => "Your search text here", num_results => 3);
Note that this is not available on all compute resources—including classic SQL and all-purpose compute. If you hit an error that VECTOR_SEARCH doesn't exist, try switching to a serverless SQL warehouse.
Databricks automatically performs the vector search on your behalf:
- Creating an embedding of your search text using the same embedding model that the index is using
- Running a similarity search against all the records (one of the special extra features of vector databases that I mentioned at the start)
- Limiting the results to the number you specified; the closest matches first
Here's an example of searching for a specific piece of text among the lines of Edgar Allan Poe's famous poem, The Raven:
That's pretty good: the embedding found "raven," and it also seems to have successfully matched "say" with "quoth."
The vector database has served its purpose, searching many lines of text and returning the few that are most relevant to the question posed.
But this is only a means to an end. To take full advantage, the next step is to use the retrieved text to augment a prompt to a Large Language Model, which will help it generate the right answer.
How to RAG with Mosaic AI Vector Search in Databricks
Remember that a RAG query is a special case of an ordinary AI query but with extra information provided from a vector database search.
For comparison, let's try an AI prompt without RAG first:
SELECT ai_query('databricks-meta-llama-3-1-70b-instruct', 'What does the raven say') AS response
UNION ALL
SELECT ai_query('databricks-meta-llama-3-1-70b-instruct', 'What kind of shore is it');
Here's what I found:
Without RAG, the LLM got the first one right anyway. Perhaps because it's near Halloween, and ravens are known to say "Nevermore" at this time of year? 🎃
But the second question is much too vague, and the LLM has no way of knowing what I was thinking about. This is typical of the kind of problem that RAG solves.
To implement RAG, incorporate the results from a vector search into the prompt, for example like this:
SELECT ai_query('databricks-meta-llama-3-1-70b-instruct',
'#question
What kind of shore is it
#information
Ghastly grim and ancient Raven wandering from the Nightly shore
Open here I flung the shutter, when, with many a flirt and flutter,
Tell this soul with sorrow laden if, within the distant Aidenn,
Answer the supplied question, taking advantage of the information provided if possible') AS response
As you can see, the vector search itself was not perfect. I would have been happy to see "Plutonian" too 😀. However, the LLM was nevertheless able to use the extra information to provide the right answer. Here are my results:
A low-code RAG implementation
A RAG implementation has many moving parts, so you may consider using a low-code interface that does much of the plumbing for you.
Here's how the same thing looks in the Matillion Data Productivity Cloud:
Another consideration is that the best prompt syntax tends to vary between LLMs, which is another problem you don't have to worry about with a low-code solution.
If you want to try the above yourself from a free Matillion trial, you can download and install the prebuilt solution from the Matillion Exchange.
How to delete a Vector Search Index in Databricks
With Change Data Feed enabled, Databricks will automatically keep the index up to date with changes to the base table.
To tidy up, you can delete the index object at any time. At the time of writing, the only way to drop a vector search index is to issue a DELETE call using the REST API.
Here's a sample code snippet for a Python notebook. Replace your-catalog, your-schema and name_of_new_vector_index with your own values.
apiURL = dbutils.notebook.entry_point.getDbutils().notebook().getContext().browserHostName().getOrElse(None)
sessionToken = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().getOrElse(None)
import requests
header = {'Authorization': 'Bearer {}'.format(sessionToken)}
endpoint = '/api/2.0/vector-search/indexes/your-catalog.your-schema.name_of_new_vector_index'
resp = requests.delete("https://" + apiURL + endpoint, headers=header)
References and further reading
Databricks reference guides:
Follow this link to sign up for a free trial of the Matillion Data Productivity Cloud.
If you are on a free trial or are already using Matillion, you can download the pipelines in the low-code RAG implementation from the Matillion Exchange.
Ian Funnell
Data Alchemist
Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Follow Ian on LinkedIn: https://www.linkedin.com/in/ianfunnell
Featured Resources
ETL and SQL: How They Work Together in Modern Data Integration
Explore how SQL and ETL power modern data workflows, when to use SQL scripts vs ETL tools, and how Matillion blends automation ...
Learn more WhitepapersUnlocking Data Productivity: A DataOps Guide for High-performance Data Teams
Download the DataOps White Paper today and start building data pipelines that are scalable, reliable, and built for success.
Learn more BlogWebhooks and Pushdown Python: Building Interactive and Efficient Data Applications
Part 5 of our blog series demonstrating the art of the possible, using Matillion products and features to build the MatiHelper ...
Learn more
Share: