Unlocking the Power of Retrieval Augmented Generation (RAG) with Azure and Cosmos DB: A Comprehensive Guide (part1)

Introduction

Retrieval augmented generation (RAG) is a pattern that combines a large-scale language model (LM) with a retrieval system to generate natural language responses that are relevant, diverse, and informative. The retrieval system can be a vector database that stores pre-computed embeddings of documents or passages that can be efficiently searched by similarity. In a previous article, we discussed the importance of RAG patterns in modern LM-based generative AI applications and highlighted some of the options that are available in Azure that can be used for RAG components. In this article, we will focus on one of the options, Azure Cosmos DB for Mongo DB vCore, and show how it can be used as a vector database for RAG. We will also provide some sample code to demonstrate how to store and query embeddings using Azure Cosmos DB for Mongo DB vCore.

What is a Vector Database?

A vector database is specifically designed to store and manage vector embeddings—mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, and tens of thousands of dimensions might be used to represent sophisticated data. These vector embeddings are used in various domains, including similarity search, recommendation systems, natural language processing, and large language models (LLMs).

Here are some key points about vector databases:

Vector Search: Vector databases enable efficient similarity search based on vector representations.
Retrieval-Augmented Generation (RAG): An increasingly popular use case involves integrating vector databases with LLMs to generate contextually relevant and accurate responses to user prompts. This approach overcomes token limits and reduces the need for frequent fine-tuning of updated data.

What is vector search?

Vector search lets you find items that are similar in meaning or content, not just by matching a specific field. This can help you with tasks like searching for related text, finding images that match a theme, making suggestions, or spotting outliers. To do a vector search, you need to use a machine-learning model that turns your data into vectors (lists of numbers) that capture their features. This is done by using an embeddings API, such as Azure OpenAI Embeddings or Hugging Face on Azure. Then, you compare the distance between the vectors of your data and the vector of your query. The closer the vectors are, the more similar the data is to your query.

With native vector search support, you can make the most of your data in applications that use the OpenAI API. You can also build your own solutions that use vector embeddings.

Vector Database /Index option in Azure.

Service	Description
Azure Cosmos DB for Mongo DB vCore	Store your application data and vector embeddings together in a single MongoDB-compatible service featuring native support for vector search.
Azure Cosmos DB for PostgreSQL	Store your data and vectors together in a scalable PostgreSQL offering with native support for vector search.
Azure Cosmos DB for NoSQL with Azure AI Search	Augment your Azure Cosmos DB data with the semantic and vector search capabilities of Azure AI Search.

Azure Cosmos DB Vector Database Extension for MongoDB vCore:

Store your application data and vector embeddings together in a single MongoDB-compatible service.
Benefit from native support for vector search.
Avoid the extra cost of moving data to a separate vector database.
Achieve data consistency, scale, and performance.
Learn more ¹.

Azure Cosmos DB Vector Database Extension for PostgreSQL:

Store your data and vectors together in a scalable PostgreSQL offering.

Leverage native support for vector search.
Seamlessly integrate vector capabilities without additional database migration costs.
Learn more ¹.

Azure Cosmos DB for NoSQL with Azure AI Search:

Augment your Azure Cosmos DB data with semantic and vector search capabilities.
Combine the power of NoSQL with AI-driven search.
Enhance retrieval-augmented generation scenarios.
Learn more ².

Remember, vector databases play a crucial role in harnessing the potential of vector embeddings, enabling advanced search, and enhancing AI-driven applications. Choose the option that best aligns with your specific use case and requirements! 🚀

A simple and effective way to store, index, and search vector data in Azure Cosmos DB for MongoDB vCore is to use the built-in vector search feature. This feature allows you to perform similarity searches on arrays of numbers, such as embeddings, using MongoDB's $vector query operator. By using this feature, you can avoid the need to move your data to more expensive vector databases and integrate your AI-driven applications smoothly with your other data.

What is Azure Cosmos DB for Mongo DB v Core?

Azure Cosmos DB for Mongo DB vCore is a fully managed, scalable, and secure service that provides API compatibility with MongoDB. It allows you to use the familiar MongoDB tools and drivers to build applications that can leverage the benefits of Azure Cosmos DB, such as global distribution, automatic scaling, and SLA-backed availability and latency. Azure Cosmos DB for Mongo DB vCore also supports MongoDB's aggregation pipeline, which enables complex queries and transformations on the data. One of the features that makes Azure Cosmos DB for Mongo DB vCore suitable for RAG is the support for MongoDB's $vector query operator, which allows you to perform similarity searches on arrays of numbers, such as embeddings.

RAG Pipeline Process

Load Datastore
Clean data
Chunking Data
Convert Chunks into vector embedding with LLM
Create a Vector Index with Converted vector embeddings.

How to use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG?

To use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG, you need to follow these steps:

Create an Azure Cosmos DB account with API for MongoDB and select the vCore model. You can choose the number of regions, availability zones, and throughput depending on your needs. You can also enable features such as geo-fencing, encryption at rest, and role-based access control.
Create a database and a collection in your Azure Cosmos DB account. You can use any schema and indexing strategy that suits your application, but you need to ensure that the field that stores the embeddings is indexed as an array. You can use the MongoDB shell or any MongoDB driver to connect to your Azure Cosmos DB account and create the database and collection.
Populate your collection with documents that contain the embeddings and any other metadata that you need for your RAG application. You can use any method to generate the embeddings, such as a pre-trained model or a custom model. You can also use any format to store the embeddings, such as a list or a base64-encoded string. You can use the MongoDB shell or any MongoDB driver to insert the documents into your collection. Alternatively, you can use Azure Data Factory or Azure Databricks to import data from various sources into your collection.
Query your collection using the $vector operator to perform a similarity search on the embeddings. You can use the MongoDB shell or any MongoDB driver to query your collection. The $vector operator takes an array of numbers as input and returns the documents that have the most similar arrays in the specified field. You can also use other query operators and aggregation stages to filter, sort, and project the results. You can integrate the query results with your LM to generate natural language responses that are relevant, diverse, and informative.

Here is some sample code that shows how to use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG:

Preliminaries

First, let's start by installing the packages that we'll need later.

! pip install numpy

! pip install openai 
! pip install pymongo
! pip install python-dotenv
! pip install azure-core
! pip install azure-cosmos
! pip install tenacity

First, let's start by installing the packages that we'll need later.

import json

import datetime
import time

from azure.core.exceptions import AzureError
from azure.core.credentials import AzureKeyCredential
import pymongo

import openai
from dotenv import load_dotenv
from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import AzureOpenAI

Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file. Make sure to modify the env_name accordingly.

print(openai.VERSION) #should be greater than 1.0xx

loading endpoint and secret API keys

from dotenv import dotenv_values

# specify the name of the .env file name 
env_name = "example.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)

cosmosdb_endpoint = config['cosmos_db_api_endpoint']
cosmosdb_key = config['cosmos_db_api_key']
cosmosdb_connection_str = config['cosmos_db_connection_string']

COSMOS_MONGO_USER = config['cosmos_db_mongo_user']
COSMOS_MONGO_PWD = config['cosmos_db_mongo_pwd']
COSMOS_MONGO_SERVER = config['cosmos_db_mongo_server']

openai.api_type = config['openai_api_type']
openai.api_key = config['openai_api_key']
openai.api_base = config['openai_api_endpoint']
openai.api_version = config['openai_api_version']
embeddings_deployment = config['openai_embeddings_deployment']
completions_deployment = config['openai_completions_deployment']

Let's start by creating an Azure Cosmos DB for MongoDB vCore Resource following this quick start guide: https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal

Then copy the connection details (server, user, pwd) into the config.json file.

Azure OpenAI

Create an Azure OpenAI resource following this quickstart: https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal

Deploy a completions embeddings model

For more information on, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/completions

For more information on, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/embeddings

Copy the endpoint, key, and deployment names for (the embeddings model, and completions model) into the config.json file.

Finally, let's set up our Azure OpenAI resource Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps:

Load data and create embeddings

Here we load a sample dataset containing descriptions of Azure services, and then we use Azure OpenAI to create vector embeddings from this data.

# Load text-sample.json data file. Embeddings will need to be generated using the function below.
#data_file = open(file="../../DataSet/AzureServices/text-sample.json", mode="r")

# OR Load text-sample_w_embeddings.json which has embeddings pre-computed
data_file = open(file="./text-sample_w_embeddings.json", mode="r") 
data = json.load(data_file)
data_file.close()

# Take a peek at one data item

print(json.dumps(data[0], indent=2))

{ "id": "1", "title": "Azure App Service", "content": "Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful APIs. It supports a variety of programming languages and frameworks, such as .NET, Java, Node.js, Python, and PHP. The service offers built-in auto-scaling and load balancing capabilities. It also provides integration with other Azure services, such as Azure DevOps, GitHub, and Bitbucket.", "category": "Web", "titleVector": [ -0.0017071267357096076, -0.01391641329973936, 0.0017036213539540768, -0.018410328775644302,

Creating Open AI Client for Library 1.xx

client = AzureOpenAI(
  api_key =config['openai_api_key'],  
  api_version = "2023-05-15",
  azure_endpoint =config['openai_api_endpoint']
)

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(10))
def generate_embeddings(text):
    '''
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    '''
   
   
    embeddings = client.embeddings.create(input = "test", model="myembeddingmodel").data[0].embedding 
    time.sleep(0.2) # rest period to avoid rate limiting on AOAI for free tier
    return embeddings

# Generate embeddings for title and content fields
n = 0
for item in data:
    n+=1
    title = item['title']
    content = item['content']
    title_embeddings = generate_embeddings(title)
    content_embeddings = generate_embeddings(content)
    item['titleVector'] = title_embeddings
    item['contentVector'] = content_embeddings
    item['@search.action'] = 'upload'
    print("Creating embeddings for item:", n, "/" ,len(data), end='\r')
# Save embeddings to sample_text_w_embeddings.json file
with open("./text-sample_w_embeddings.json", "w") as f:
    json.dump(data, f)

Connect and setup Cosmos DB for MongoDB vCore

Set up the connection

mongo_client = pymongo.MongoClient("mongodb+srv://user:pass@xxxx.mongocluster.cosmos.azure.com/
?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000")

Set up the DB and collection

# create a database called TutorialDB
db = mongo_client['ExampleDB']

# Create collection if it doesn't exist
COLLECTION_NAME = "ExampleCollection"

collection = db[COLLECTION_NAME]

if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    db.create_collection(COLLECTION_NAME)
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
else:
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

## Use only if re-reunning code and want to reset db and collection
collection.drop_index("VectorSearchIndex")
mongo_client.drop_database("ExampleDB")

Create the vector index

IMPORTANT: You can only create one index per vector property. That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.

IVF

IVF is the default vector indexing algorithm, which works on all cluster tiers. It's an approximate nearest neighbors (ANN) approach that uses clustering to speed up the search for similar vectors in a dataset.

db.command({
  'createIndexes': 'ExampleCollection',
  'indexes': [
    {
      'name': 'VectorSearchIndex',
      'key': {
        "contentVector": "cosmosSearch"
      },
      'cosmosSearchOptions': {
        'kind': 'vector-ivf',
        'numLists': 1,
        'similarity': 'COS',
        'dimensions': 1536
      }
    }
  ]
})

HNSW (preview ) you skip this

NSW is a graph-based data structure that organizes vectors into clusters and subclusters. It facilitates fast approximate nearest neighbor search, achieving higher speeds with improved accuracy. As a preview feature, you can enable HNSW using Azure Feature Enablement Control (AFEC) by selecting the “mongoHnswIndex” feature. For detailed instructions, refer to the enable preview features documentation.

Keep in mind that HNSW operates on M50 cluster tiers and higher while in preview. 🚀

Upload data to the collection

A simple insert_many() to insert our data in JSON format into the newly created DB and collection.

collection.insert_many(data)

Vector Search in Cosmos DB for MongoDB vCore

# Simple function to assist with vector search
def vector_search(query, num_results=3):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results #, "efsearch": 40 # optional for HNSW only 
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]
    results = collection.aggregate(pipeline)
    return results

Let's run a test query below.

query = "What are the services for running ML models?"
results = vector_search(query)
for result in results: 
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

Q&A over the data with GPT-3.5

Finally, we'll create a helper function to feed prompts into the Completions model. Then we'll create an interactive loop where you can pose questions to the model and receive information grounded in your data.

#This function helps to ground the model with prompts and system instructions.

def generate_completion(prompt):
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - Only provide answers that have products that are part of Microsoft Azure.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."
    '''

    listmessages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input},
    ]

    for item in results:
        listmessages.append({"role": "system", "content": prompt['content']})
    
            
    response = client.chat.completions.create(
    model="demo35", # model = "deployment_name".
    messages=listmessages
)
    
    return response.choices[0].message.content

generate_completion("Where i can host container in azure")  # test Qustion with out RAG

# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""
print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")
user_input = input("Prompt: ")
while user_input.lower() != "end":
    results_for_prompt = vector_search(user_input)
   # print(f"User Prompt: {user_input}")
    completions_results = generate_completion(results_for_prompt)
    print("\n")
    print(completions_results)
    user_input = input("Prompt: ")

Source Code GitHub https://github.com/Usamawahabkhan/Usamawahabkhan-RAG-Azure-CosmosDb-MongoDb-vcore/blob/main/AzureOpenAI-CosmosDB-MongoDB-vCore_Tutorial.ipynb

Conclusion

In this article, we have seen how to use RAG, a powerful technique for generating natural language answers from large-scale text corpora, with Cosmos DB for MongoDB vCore, a fully managed and scalable database service that supports the MongoDB API. We have shown how to upload data to the collection, create a vector index, and perform vector search using PyMongo and Faiss. We have also discussed some of the benefits and challenges of using RAG and Cosmos DB for MongoDB vCore for building GenAI applications based on LLMs.

RAG is becoming an essential part of any modern GenAI app based on LLMs, as it enables generating of high-quality and diverse responses from multiple sources of knowledge. Storing data is crucial and requires security, availability and scalability, where Azure Cosmos DB can be a very valuable choice. However, there are other options available in Azure and open source, depending on the application requirements and compliance. We encourage you to explore different alternatives and find the one that suits your needs best.

#MVPBuz #Azure #AzureCosmosDB #MongoDBvCore #VectorDatabase #RAG #NoSQL#AIApplications #DataManagement #LLM #GenerativeAI

How to use Azure Cosmos DB for Mongo DB v-Core as vector database for RAG (retrieval augmented generation ) Part 2

Sunday, March 3, 2024