How to use Azure Cosmos DB for Mongo DB v-Core as vector database for RAG (retrieval augmented generation ) Part 2

Sunday, March 3, 2024

 Unlocking the Power of Retrieval Augmented Generation (RAG) with Azure and Cosmos DB: A Comprehensive Guide (part1)


Retrieval augmented generation (RAG) is a pattern that combines a large-scale language model (LM) with a retrieval system to generate natural language responses that are relevant, diverse, and informative. The retrieval system can be a vector database that stores pre-computed embeddings of documents or passages that can be efficiently searched by similarity. In a previous article, we discussed the importance of RAG patterns in modern LM-based generative AI applications and highlighted some of the options that are available in Azure that can be used for RAG components. In this article, we will focus on one of the options, Azure Cosmos DB for Mongo DB vCore, and show how it can be used as a vector database for RAG. We will also provide some sample code to demonstrate how to store and query embeddings using Azure Cosmos DB for Mongo DB vCore.

What is a Vector Database?

vector database is specifically designed to store and manage vector embeddings—mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, and tens of thousands of dimensions might be used to represent sophisticated data. These vector embeddings are used in various domains, including similarity search, recommendation systems, natural language processing, and large language models (LLMs).

Here are some key points about vector databases:

  • Vector Search: Vector databases enable efficient similarity search based on vector representations.
  • Retrieval-Augmented Generation (RAG): An increasingly popular use case involves integrating vector databases with LLMs to generate contextually relevant and accurate responses to user prompts. This approach overcomes token limits and reduces the need for frequent fine-tuning of updated data.

What is vector search?

Vector search lets you find items that are similar in meaning or content, not just by matching a specific field. This can help you with tasks like searching for related text, finding images that match a theme, making suggestions, or spotting outliers. To do a vector search, you need to use a machine-learning model that turns your data into vectors (lists of numbers) that capture their features. This is done by using an embeddings API, such as Azure OpenAI Embeddings or Hugging Face on Azure. Then, you compare the distance between the vectors of your data and the vector of your query. The closer the vectors are, the more similar the data is to your query.

With native vector search support, you can make the most of your data in applications that use the OpenAI API. You can also build your own solutions that use vector embeddings.

Vector Database /Index option in Azure.



Azure Cosmos DB for Mongo DB vCore

Store your application data and vector embeddings together in a single MongoDB-compatible service featuring native support for vector search.

Azure Cosmos DB for PostgreSQL

Store your data and vectors together in a scalable PostgreSQL offering with native support for vector search.

Azure Cosmos DB for NoSQL with Azure AI Search

Augment your Azure Cosmos DB data with the semantic and vector search capabilities of Azure AI Search.


  1. Azure Cosmos DB Vector Database Extension for MongoDB vCore:
    • Store your application data and vector embeddings together in a single MongoDB-compatible service.
    • Benefit from native support for vector search.
    • Avoid the extra cost of moving data to a separate vector database.
    • Achieve data consistency, scale, and performance.
    • Learn more1.
  2. Azure Cosmos DB Vector Database Extension for PostgreSQL:

    • Store your data and vectors together in a scalable PostgreSQL offering.
    • Leverage native support for vector search.
    • Seamlessly integrate vector capabilities without additional database migration costs.
    • Learn more1.

  3. Azure Cosmos DB for NoSQL with Azure AI Search
    • Augment your Azure Cosmos DB data with semantic and vector search capabilities.
    • Combine the power of NoSQL with AI-driven search.
    • Enhance retrieval-augmented generation scenarios.
    • Learn more2.

Remember, vector databases play a crucial role in harnessing the potential of vector embeddings, enabling advanced search, and enhancing AI-driven applications. Choose the option that best aligns with your specific use case and requirements! 🚀

A simple and effective way to store, index, and search vector data in Azure Cosmos DB for MongoDB vCore is to use the built-in vector search feature. This feature allows you to perform similarity searches on arrays of numbers, such as embeddings, using MongoDB's $vector query operator. By using this feature, you can avoid the need to move your data to more expensive vector databases and integrate your AI-driven applications smoothly with your other data.


What is Azure Cosmos DB for Mongo DB v Core?

Azure Cosmos DB for Mongo DB vCore is a fully managed, scalable, and secure service that provides API compatibility with MongoDB. It allows you to use the familiar MongoDB tools and drivers to build applications that can leverage the benefits of Azure Cosmos DB, such as global distribution, automatic scaling, and SLA-backed availability and latency. Azure Cosmos DB for Mongo DB vCore also supports MongoDB's aggregation pipeline, which enables complex queries and transformations on the data. One of the features that makes Azure Cosmos DB for Mongo DB vCore suitable for RAG is the support for MongoDB's $vector query operator, which allows you to perform similarity searches on arrays of numbers, such as embeddings.


RAG Pipeline Process

  1. Load Datastore
  2. Clean data
  3. Chunking Data
  4. Convert Chunks into vector embedding with LLM
  5. Create a Vector Index with Converted vector embeddings. 

How to use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG?

To use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG, you need to follow these steps:

  •   Create an Azure Cosmos DB account with API for MongoDB and select the vCore model. You can choose the number of regions, availability zones, and throughput depending on your needs. You can also enable features such as geo-fencing, encryption at rest, and role-based access control.
  •  Create a database and a collection in your Azure Cosmos DB account. You can use any schema and indexing strategy that suits your application, but you need to ensure that the field that stores the embeddings is indexed as an array. You can use the MongoDB shell or any MongoDB driver to connect to your Azure Cosmos DB account and create the database and collection.
  • Populate your collection with documents that contain the embeddings and any other metadata that you need for your RAG application. You can use any method to generate the embeddings, such as a pre-trained model or a custom model. You can also use any format to store the embeddings, such as a list or a base64-encoded string. You can use the MongoDB shell or any MongoDB driver to insert the documents into your collection. Alternatively, you can use Azure Data Factory or Azure Databricks to import data from various sources into your collection.
  • Query your collection using the $vector operator to perform a similarity search on the embeddings. You can use the MongoDB shell or any MongoDB driver to query your collection. The $vector operator takes an array of numbers as input and returns the documents that have the most similar arrays in the specified field. You can also use other query operators and aggregation stages to filter, sort, and project the results. You can integrate the query results with your LM to generate natural language responses that are relevant, diverse, and informative.

Here is some sample code that shows how to use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG:




First, let's start by installing the packages that we'll need later.


 ! pip install numpy

! pip install openai
! pip install pymongo
! pip install python-dotenv
! pip install azure-core
! pip install azure-cosmos
! pip install tenacity

First, let's start by installing the packages that we'll need later.

 import json

import datetime
import time

from azure.core.exceptions import AzureError
from azure.core.credentials import AzureKeyCredential
import pymongo

import openai
from dotenv import load_dotenv
from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import AzureOpenAI

 Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file. Make sure to modify the env_name accordingly.

print(openai.VERSION) #should be greater than 1.0xx

loading endpoint and secret API keys 

from dotenv import dotenv_values

# specify the name of the .env file name
env_name = "example.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)

cosmosdb_endpoint = config['cosmos_db_api_endpoint']
cosmosdb_key = config['cosmos_db_api_key']
cosmosdb_connection_str = config['cosmos_db_connection_string']

COSMOS_MONGO_USER = config['cosmos_db_mongo_user']
COSMOS_MONGO_PWD = config['cosmos_db_mongo_pwd']
COSMOS_MONGO_SERVER = config['cosmos_db_mongo_server']

openai.api_type = config['openai_api_type']
openai.api_key = config['openai_api_key']
openai.api_base = config['openai_api_endpoint']
openai.api_version = config['openai_api_version']
embeddings_deployment = config['openai_embeddings_deployment']
completions_deployment = config['openai_completions_deployment']

# Take a peek at one data item

print(json.dumps(data[0], indent=2))

{ "id": "1", "title": "Azure App Service", "content": "Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful APIs. It supports a variety of programming languages and frameworks, such as .NET, Java, Node.js, Python, and PHP. The service offers built-in auto-scaling and load balancing capabilities. It also provides integration with other Azure services, such as Azure DevOps, GitHub, and Bitbucket.", "category": "Web", "titleVector": [ -0.0017071267357096076, -0.01391641329973936, 0.0017036213539540768, -0.018410328775644302,

Creating Open AI Client for Library 1.xx

client = AzureOpenAI(
  api_key =config['openai_api_key'],  
  api_version = "2023-05-15",
  azure_endpoint =config['openai_api_endpoint']

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(10))
def generate_embeddings(text):
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    embeddings = client.embeddings.create(input = "test", model="myembeddingmodel").data[0].embedding
    time.sleep(0.2) # rest period to avoid rate limiting on AOAI for free tier
    return embeddings

# Generate embeddings for title and content fields
n = 0
for item in data:
    title = item['title']
    content = item['content']
    title_embeddings = generate_embeddings(title)
    content_embeddings = generate_embeddings(content)
    item['titleVector'] = title_embeddings
    item['contentVector'] = content_embeddings
    item['@search.action'] = 'upload'
    print("Creating embeddings for item:", n, "/" ,len(data), end='\r')
# Save embeddings to sample_text_w_embeddings.json file
with open("./text-sample_w_embeddings.json", "w") as f:
    json.dump(data, f)

Connect and setup Cosmos DB for MongoDB vCore

Set up the connection

mongo_client = pymongo.MongoClient("mongodb+srv://


Set up the DB and collection

# create a database called TutorialDB
db = mongo_client['ExampleDB']

# Create collection if it doesn't exist
COLLECTION_NAME = "ExampleCollection"

collection = db[COLLECTION_NAME]

if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

## Use only if re-reunning code and want to reset db and collection


Create the vector index

IMPORTANT: You can only create one index per vector property. That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.


IVF is the default vector indexing algorithm, which works on all cluster tiers. It's an approximate nearest neighbors (ANN) approach that uses clustering to speed up the search for similar vectors in a dataset.

  'createIndexes': 'ExampleCollection',
  'indexes': [
      'name': 'VectorSearchIndex',
      'key': {
        "contentVector": "cosmosSearch"
      'cosmosSearchOptions': {
        'kind': 'vector-ivf',
        'numLists': 1,
        'similarity': 'COS',
        'dimensions': 1536

HNSW (preview ) you skip this

NSW is a graph-based data structure that organizes vectors into clusters and subclusters. It facilitates fast approximate nearest neighbor search, achieving higher speeds with improved accuracy. As a preview feature, you can enable HNSW using Azure Feature Enablement Control (AFEC) by selecting the “mongoHnswIndex” feature. For detailed instructions, refer to the enable preview features documentation.

Keep in mind that HNSW operates on M50 cluster tiers and higher while in preview. 🚀


Upload data to the collection

A simple insert_many() to insert our data in JSON format into the newly created DB and collection.


Vector Search in Cosmos DB for MongoDB vCore

# Simple function to assist with vector search
def vector_search(query, num_results=3):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results #, "efsearch": 40 # optional for HNSW only
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    results = collection.aggregate(pipeline)
    return results

query = "What are the services for running ML models?"
results = vector_search(query)
for result in results:
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  


Q&A over the data with GPT-3.5

Finally, we'll create a helper function to feed prompts into the Completions model. Then we'll create an interactive loop where you can pose questions to the model and receive information grounded in your data.

#This function helps to ground the model with prompts and system instructions.

def generate_completion(prompt):
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - Only provide answers that have products that are part of Microsoft Azure.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."

        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input},

    for item in results:
        listmessages.append({"role": "system", "content": prompt['content']})
    response =
    model="demo35", # model = "deployment_name".
    return response.choices[0].message.content

generate_completion("Where i can host container in azure")  # test Qustion with out RAG

# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""
print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")
user_input = input("Prompt: ")
while user_input.lower() != "end":
    results_for_prompt = vector_search(user_input)
   # print(f"User Prompt: {user_input}")
    completions_results = generate_completion(results_for_prompt)
    user_input = input("Prompt: ")

Source Code GitHub


In this article, we have seen how to use RAG, a powerful technique for generating natural language answers from large-scale text corpora, with Cosmos DB for MongoDB vCore, a fully managed and scalable database service that supports the MongoDB API. We have shown how to upload data to the collection, create a vector index, and perform vector search using PyMongo and Faiss. We have also discussed some of the benefits and challenges of using RAG and Cosmos DB for MongoDB vCore for building GenAI applications based on LLMs.

RAG is becoming an essential part of any modern GenAI app based on LLMs, as it enables generating of high-quality and diverse responses from multiple sources of knowledge. Storing data is crucial and requires security, availability and scalability, where Azure Cosmos DB can be a very valuable choice. However, there are other options available in Azure and open source, depending on the application requirements and compliance. We encourage you to explore different alternatives and find the one that suits your needs best.

 #MVPBuz #Azure #AzureCosmosDB  #MongoDBvCore #VectorDatabase #RAG #NoSQL#AIApplications #DataManagement #LLM #GenerativeAI

share this post
Share to Facebook Share to Twitter Share to Google+ Share to Stumble Upon Share to Evernote Share to Blogger Share to Email Share to Yahoo Messenger More...