Introducing Meta LLaMA 3: A Leap Forward in Large Language Models

Thursday, April 18, 2024

Meta has recently unveiled its latest innovation in the realm of artificial intelligence: the LLaMA 3 large language model. This state-of-the-art model represents a significant advancement in AI technology, offering unprecedented capabilities and accessibility.

What is LLaMA 3?

LLaMA 3 is the third iteration of Meta's large language model series. It is an open-source model that has been fine-tuned with instructions to optimize its performance across a wide array of tasks. The model comes in two sizes: one with 8 billion parameters and another with a colossal 70 billion parameters.

Features and Capabilities

The LLaMA 3 models are designed to excel in language understanding and generation, making them highly effective for applications such as dialogue systems, content creation, and complex problem-solving. Some of the key features include:

-Enhanced Reasoning

LLaMA 3 demonstrates improved reasoning abilities, allowing it to handle multi-step problems with ease.

-Multilingual and Multimodal Future

Plans are underway to make LLaMA 3 multilingual and multimodal, further expanding its versatility.

Extended Context Windows

The new models support longer context windows, enabling them to maintain coherence over larger text spans.

The Meta Llama 3 models have been enhanced with a substantial increase in training tokens, reaching 15trillion, which greatly improves their ability to grasp the nuances of language. The context window has been expanded to 8,000 tokens, effectively doubling the previous model's capacity and allowing for the processing of more extensive text excerpts, which aids in making more informed decisions. Additionally, these models employ a novel Tiktoken-based tokenizer that boasts a128,000-token vocabulary, resulting in a more efficient encoding of characters per token. Meta has observed improved performance in both English and multilingual benchmark assessments, confirming the models' strong capabilities in handling multiple languages.

Unmatched Performance Excellence

The introduction of our 8B and 70B parameter LLaMA 3 models marks a significant advancement beyond the capabilities of LLaMA 2, setting a new benchmark for large language models (LLMs) at these scales. Enhanced pretraining and refined post-training techniques have elevated our models to the pinnacle of performance, making them the premier choice in the current landscape for 8B and 70B parameter models. Notable enhancements in our post-training processes have led to a considerable decrease in incorrect rejections, bolstered model alignment, and enriched the variety of responses generated by the models. Furthermore, we've observed a remarkable enhancement in functions such as logical reasoning, code creation, and adherence to instructions, rendering LLaMA 3 more adaptable and responsive to user guidance.

Accessibility and Community Support

In line with Meta's commitment to open innovation, LLaMA 3 is made available to the broader community. It can be accessed on various platforms, including AWS, Databricks, Google Cloud, and Microsoft Azure, among others¹. This move is intended to foster a wave of AI innovation across different sectors.

It's now available on Azure

https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-meta-llama-3-models-on-azure-ai-model-catalog/ba-p/4117144

Trust and Safety

Meta has introduced new trust and safety tools, such as LLaMA Guard 2 and Code Shield, to ensure the responsible use of LLaMA 3. These tools are part of a comprehensive approach to address the ethical considerations associated with deploying large language models¹.

The Impact of LLaMA 3

The release of LLaMA 3 is poised to have a profound impact on the AI landscape. By providing a powerful tool that is openly accessible, Meta is enabling developers and researchers to push the boundaries of what's possible with AI. The model's capabilities in understanding and generating human-like text will unlock new possibilities in various fields, from education to customer service.

As we look to the future, LLaMA 3 stands as a testament to Meta's dedication to advancing AI technology while maintaining a focus on ethical and responsible development. It's an exciting time for AI, and LLaMA 3 is at the forefront of this technological revolution.

More details

(1) Introducing Meta Llama 3: The most capable openly available LLM to date. https://ai.meta.com/blog/meta-llama-3/.

(2) Meta Llama 3. https://llama.meta.com/llama3/.

#Meta #llama #Azure #MVPBuzz #generativeai #GenAI #LLM #Opensource

Power BI With Copilot

Sunday, April 14, 2024

Get Started with Copilot for Power BI and Create Reports Faster

Ready to unlock the power of AI in your data analysis? Copilot, a new feature in Power BI Desktop, is here to help you create reports faster and easier. With Copilot's assistance, you can generate report summaries, suggest report content, and even create entire report pages based on your high-level instructions.

What You'll Need to Use Copilot:

Access: You'll need write access to a workspace assigned to a paid Power BI capacity (P1 or higher) or a paid Fabric capacity (F64 or higher) with Copilot enabled by your administrator.
Power BI Desktop: Ensure you're using the latest version of Power BI Desktop.

Getting Started with Copilot:

Enable Copilot (Admin): Your administrator needs to enable Copilot in Microsoft Fabric and activate the tenant switch.
Open the Copilot Pane: Click the Copilot icon in the ribbon to open the Copilot pane.
Welcome and Workspace Selection: The first time you use Copilot, a dialog will appear prompting you to choose a compatible workspace. Select any workspace assigned to the required capacity.
Start Your Interaction: Once you've selected a workspace, you'll see a welcome card. Click "Get started" to begin using Copilot.

How Copilot Can Help You:

Summarize Your Data Model: Gain a clearer understanding of your data with Copilot's summaries of your Power BI semantic model. This can help you identify key insights and streamline your report building process.

Suggest Report Content: Stuck on what to include in your report? Copilot can analyze your data and propose relevant topics for you to explore. From the Copilot pane, select "Suggest content for a report" to get started.
Create Report Pages: Save time crafting reports from scratch. Provide Copilot with a high-level prompt related to your data, and Copilot will generate a customizable report page with relevant tables, fields, measures, and charts. Here are some examples of prompts you can use:

Analyze performance across different shifts based on metrics like good count, reject count, and alarm count.
Evaluate production line efficiency and overall equipment effectiveness.
Compare costs, materials, and their impact on production for each product

Important Considerations:

The Copilot button will always be visible in the ribbon, but functionality requires signing in, admin enabled tenant settings, and workspace access as mentioned earlier.
The workspace you select for Copilot usage doesn't have to be the same one where you publish your report.
Copilot is currently in preview, and its responses are generated using AI, so always double-check your work for accuracy.
There are limitations for creating report pages with certain connection modes like live connect to SSAS and AAS, and real-time streaming in Power BI.

Here are four examples of what Copilot can generate.

Stay Updated:

Keep an eye out for the latest Copilot enhancements by following the monthly Power BI feature summary blogs.

https://learn.microsoft.com/en-us/power-bi/create-reports/copilot-introduction

Embrace AI-powered report creation with Copilot and transform your data analysis workflow!

RAG live Demo with Azure and openAi

Saturday, April 6, 2024

How to use Azure Cosmos DB for Mongo DB v-Core as vector database for RAG (retrieval augmented generation ) Part 2

Sunday, March 3, 2024

Unlocking the Power of Retrieval Augmented Generation (RAG) with Azure and Cosmos DB: A Comprehensive Guide (part1)

Introduction

Retrieval augmented generation (RAG) is a pattern that combines a large-scale language model (LM) with a retrieval system to generate natural language responses that are relevant, diverse, and informative. The retrieval system can be a vector database that stores pre-computed embeddings of documents or passages that can be efficiently searched by similarity. In a previous article, we discussed the importance of RAG patterns in modern LM-based generative AI applications and highlighted some of the options that are available in Azure that can be used for RAG components. In this article, we will focus on one of the options, Azure Cosmos DB for Mongo DB vCore, and show how it can be used as a vector database for RAG. We will also provide some sample code to demonstrate how to store and query embeddings using Azure Cosmos DB for Mongo DB vCore.

What is a Vector Database?

A vector database is specifically designed to store and manage vector embeddings—mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, and tens of thousands of dimensions might be used to represent sophisticated data. These vector embeddings are used in various domains, including similarity search, recommendation systems, natural language processing, and large language models (LLMs).

Here are some key points about vector databases:

Vector Search: Vector databases enable efficient similarity search based on vector representations.
Retrieval-Augmented Generation (RAG): An increasingly popular use case involves integrating vector databases with LLMs to generate contextually relevant and accurate responses to user prompts. This approach overcomes token limits and reduces the need for frequent fine-tuning of updated data.

What is vector search?

Vector search lets you find items that are similar in meaning or content, not just by matching a specific field. This can help you with tasks like searching for related text, finding images that match a theme, making suggestions, or spotting outliers. To do a vector search, you need to use a machine-learning model that turns your data into vectors (lists of numbers) that capture their features. This is done by using an embeddings API, such as Azure OpenAI Embeddings or Hugging Face on Azure. Then, you compare the distance between the vectors of your data and the vector of your query. The closer the vectors are, the more similar the data is to your query.

With native vector search support, you can make the most of your data in applications that use the OpenAI API. You can also build your own solutions that use vector embeddings.

Vector Database /Index option in Azure.

Service	Description
Azure Cosmos DB for Mongo DB vCore	Store your application data and vector embeddings together in a single MongoDB-compatible service featuring native support for vector search.
Azure Cosmos DB for PostgreSQL	Store your data and vectors together in a scalable PostgreSQL offering with native support for vector search.
Azure Cosmos DB for NoSQL with Azure AI Search	Augment your Azure Cosmos DB data with the semantic and vector search capabilities of Azure AI Search.

Azure Cosmos DB Vector Database Extension for MongoDB vCore:

Store your application data and vector embeddings together in a single MongoDB-compatible service.
Benefit from native support for vector search.
Avoid the extra cost of moving data to a separate vector database.
Achieve data consistency, scale, and performance.
Learn more ¹.

Azure Cosmos DB Vector Database Extension for PostgreSQL:

Store your data and vectors together in a scalable PostgreSQL offering.

Leverage native support for vector search.
Seamlessly integrate vector capabilities without additional database migration costs.
Learn more ¹.

Azure Cosmos DB for NoSQL with Azure AI Search:

Augment your Azure Cosmos DB data with semantic and vector search capabilities.
Combine the power of NoSQL with AI-driven search.
Enhance retrieval-augmented generation scenarios.
Learn more ².

Remember, vector databases play a crucial role in harnessing the potential of vector embeddings, enabling advanced search, and enhancing AI-driven applications. Choose the option that best aligns with your specific use case and requirements! 🚀

A simple and effective way to store, index, and search vector data in Azure Cosmos DB for MongoDB vCore is to use the built-in vector search feature. This feature allows you to perform similarity searches on arrays of numbers, such as embeddings, using MongoDB's $vector query operator. By using this feature, you can avoid the need to move your data to more expensive vector databases and integrate your AI-driven applications smoothly with your other data.

What is Azure Cosmos DB for Mongo DB v Core?

Azure Cosmos DB for Mongo DB vCore is a fully managed, scalable, and secure service that provides API compatibility with MongoDB. It allows you to use the familiar MongoDB tools and drivers to build applications that can leverage the benefits of Azure Cosmos DB, such as global distribution, automatic scaling, and SLA-backed availability and latency. Azure Cosmos DB for Mongo DB vCore also supports MongoDB's aggregation pipeline, which enables complex queries and transformations on the data. One of the features that makes Azure Cosmos DB for Mongo DB vCore suitable for RAG is the support for MongoDB's $vector query operator, which allows you to perform similarity searches on arrays of numbers, such as embeddings.

RAG Pipeline Process

Load Datastore
Clean data
Chunking Data
Convert Chunks into vector embedding with LLM
Create a Vector Index with Converted vector embeddings.

How to use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG?

To use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG, you need to follow these steps:

Create an Azure Cosmos DB account with API for MongoDB and select the vCore model. You can choose the number of regions, availability zones, and throughput depending on your needs. You can also enable features such as geo-fencing, encryption at rest, and role-based access control.
Create a database and a collection in your Azure Cosmos DB account. You can use any schema and indexing strategy that suits your application, but you need to ensure that the field that stores the embeddings is indexed as an array. You can use the MongoDB shell or any MongoDB driver to connect to your Azure Cosmos DB account and create the database and collection.
Populate your collection with documents that contain the embeddings and any other metadata that you need for your RAG application. You can use any method to generate the embeddings, such as a pre-trained model or a custom model. You can also use any format to store the embeddings, such as a list or a base64-encoded string. You can use the MongoDB shell or any MongoDB driver to insert the documents into your collection. Alternatively, you can use Azure Data Factory or Azure Databricks to import data from various sources into your collection.
Query your collection using the $vector operator to perform a similarity search on the embeddings. You can use the MongoDB shell or any MongoDB driver to query your collection. The $vector operator takes an array of numbers as input and returns the documents that have the most similar arrays in the specified field. You can also use other query operators and aggregation stages to filter, sort, and project the results. You can integrate the query results with your LM to generate natural language responses that are relevant, diverse, and informative.

Here is some sample code that shows how to use Azure Cosmos DB for Mongo DB vCore as a vector database for RAG:

Preliminaries

First, let's start by installing the packages that we'll need later.

! pip install numpy

! pip install openai 
! pip install pymongo
! pip install python-dotenv
! pip install azure-core
! pip install azure-cosmos
! pip install tenacity

First, let's start by installing the packages that we'll need later.

import json

import datetime
import time

from azure.core.exceptions import AzureError
from azure.core.credentials import AzureKeyCredential
import pymongo

import openai
from dotenv import load_dotenv
from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import AzureOpenAI

Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file. Make sure to modify the env_name accordingly.

print(openai.VERSION) #should be greater than 1.0xx

loading endpoint and secret API keys

from dotenv import dotenv_values

# specify the name of the .env file name 
env_name = "example.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)

cosmosdb_endpoint = config['cosmos_db_api_endpoint']
cosmosdb_key = config['cosmos_db_api_key']
cosmosdb_connection_str = config['cosmos_db_connection_string']

COSMOS_MONGO_USER = config['cosmos_db_mongo_user']
COSMOS_MONGO_PWD = config['cosmos_db_mongo_pwd']
COSMOS_MONGO_SERVER = config['cosmos_db_mongo_server']

openai.api_type = config['openai_api_type']
openai.api_key = config['openai_api_key']
openai.api_base = config['openai_api_endpoint']
openai.api_version = config['openai_api_version']
embeddings_deployment = config['openai_embeddings_deployment']
completions_deployment = config['openai_completions_deployment']

Let's start by creating an Azure Cosmos DB for MongoDB vCore Resource following this quick start guide: https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal

Then copy the connection details (server, user, pwd) into the config.json file.

Azure OpenAI

Create an Azure OpenAI resource following this quickstart: https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal

Deploy a completions embeddings model

For more information on, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/completions

For more information on, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/embeddings

Copy the endpoint, key, and deployment names for (the embeddings model, and completions model) into the config.json file.

Finally, let's set up our Azure OpenAI resource Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps:

Load data and create embeddings

Here we load a sample dataset containing descriptions of Azure services, and then we use Azure OpenAI to create vector embeddings from this data.

# Load text-sample.json data file. Embeddings will need to be generated using the function below.
#data_file = open(file="../../DataSet/AzureServices/text-sample.json", mode="r")

# OR Load text-sample_w_embeddings.json which has embeddings pre-computed
data_file = open(file="./text-sample_w_embeddings.json", mode="r") 
data = json.load(data_file)
data_file.close()

# Take a peek at one data item

print(json.dumps(data[0], indent=2))

{ "id": "1", "title": "Azure App Service", "content": "Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful APIs. It supports a variety of programming languages and frameworks, such as .NET, Java, Node.js, Python, and PHP. The service offers built-in auto-scaling and load balancing capabilities. It also provides integration with other Azure services, such as Azure DevOps, GitHub, and Bitbucket.", "category": "Web", "titleVector": [ -0.0017071267357096076, -0.01391641329973936, 0.0017036213539540768, -0.018410328775644302,

Creating Open AI Client for Library 1.xx

client = AzureOpenAI(
  api_key =config['openai_api_key'],  
  api_version = "2023-05-15",
  azure_endpoint =config['openai_api_endpoint']
)

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(10))
def generate_embeddings(text):
    '''
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    '''
   
   
    embeddings = client.embeddings.create(input = "test", model="myembeddingmodel").data[0].embedding 
    time.sleep(0.2) # rest period to avoid rate limiting on AOAI for free tier
    return embeddings

# Generate embeddings for title and content fields
n = 0
for item in data:
    n+=1
    title = item['title']
    content = item['content']
    title_embeddings = generate_embeddings(title)
    content_embeddings = generate_embeddings(content)
    item['titleVector'] = title_embeddings
    item['contentVector'] = content_embeddings
    item['@search.action'] = 'upload'
    print("Creating embeddings for item:", n, "/" ,len(data), end='\r')
# Save embeddings to sample_text_w_embeddings.json file
with open("./text-sample_w_embeddings.json", "w") as f:
    json.dump(data, f)

Connect and setup Cosmos DB for MongoDB vCore

Set up the connection

mongo_client = pymongo.MongoClient("mongodb+srv://user:pass@xxxx.mongocluster.cosmos.azure.com/
?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000")

Set up the DB and collection

# create a database called TutorialDB
db = mongo_client['ExampleDB']

# Create collection if it doesn't exist
COLLECTION_NAME = "ExampleCollection"

collection = db[COLLECTION_NAME]

if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    db.create_collection(COLLECTION_NAME)
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
else:
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

## Use only if re-reunning code and want to reset db and collection
collection.drop_index("VectorSearchIndex")
mongo_client.drop_database("ExampleDB")

Create the vector index

IMPORTANT: You can only create one index per vector property. That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.

IVF

IVF is the default vector indexing algorithm, which works on all cluster tiers. It's an approximate nearest neighbors (ANN) approach that uses clustering to speed up the search for similar vectors in a dataset.

db.command({
  'createIndexes': 'ExampleCollection',
  'indexes': [
    {
      'name': 'VectorSearchIndex',
      'key': {
        "contentVector": "cosmosSearch"
      },
      'cosmosSearchOptions': {
        'kind': 'vector-ivf',
        'numLists': 1,
        'similarity': 'COS',
        'dimensions': 1536
      }
    }
  ]
})

HNSW (preview ) you skip this

NSW is a graph-based data structure that organizes vectors into clusters and subclusters. It facilitates fast approximate nearest neighbor search, achieving higher speeds with improved accuracy. As a preview feature, you can enable HNSW using Azure Feature Enablement Control (AFEC) by selecting the “mongoHnswIndex” feature. For detailed instructions, refer to the enable preview features documentation.

Keep in mind that HNSW operates on M50 cluster tiers and higher while in preview. 🚀

Upload data to the collection

A simple insert_many() to insert our data in JSON format into the newly created DB and collection.

collection.insert_many(data)

Vector Search in Cosmos DB for MongoDB vCore

# Simple function to assist with vector search
def vector_search(query, num_results=3):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results #, "efsearch": 40 # optional for HNSW only 
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]
    results = collection.aggregate(pipeline)
    return results

Let's run a test query below.

query = "What are the services for running ML models?"
results = vector_search(query)
for result in results: 
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

Q&A over the data with GPT-3.5

Finally, we'll create a helper function to feed prompts into the Completions model. Then we'll create an interactive loop where you can pose questions to the model and receive information grounded in your data.

#This function helps to ground the model with prompts and system instructions.

def generate_completion(prompt):
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - Only provide answers that have products that are part of Microsoft Azure.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."
    '''

    listmessages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input},
    ]

    for item in results:
        listmessages.append({"role": "system", "content": prompt['content']})
    
            
    response = client.chat.completions.create(
    model="demo35", # model = "deployment_name".
    messages=listmessages
)
    
    return response.choices[0].message.content

generate_completion("Where i can host container in azure")  # test Qustion with out RAG

# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""
print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")
user_input = input("Prompt: ")
while user_input.lower() != "end":
    results_for_prompt = vector_search(user_input)
   # print(f"User Prompt: {user_input}")
    completions_results = generate_completion(results_for_prompt)
    print("\n")
    print(completions_results)
    user_input = input("Prompt: ")

Source Code GitHub https://github.com/Usamawahabkhan/Usamawahabkhan-RAG-Azure-CosmosDb-MongoDb-vcore/blob/main/AzureOpenAI-CosmosDB-MongoDB-vCore_Tutorial.ipynb

Conclusion

In this article, we have seen how to use RAG, a powerful technique for generating natural language answers from large-scale text corpora, with Cosmos DB for MongoDB vCore, a fully managed and scalable database service that supports the MongoDB API. We have shown how to upload data to the collection, create a vector index, and perform vector search using PyMongo and Faiss. We have also discussed some of the benefits and challenges of using RAG and Cosmos DB for MongoDB vCore for building GenAI applications based on LLMs.

RAG is becoming an essential part of any modern GenAI app based on LLMs, as it enables generating of high-quality and diverse responses from multiple sources of knowledge. Storing data is crucial and requires security, availability and scalability, where Azure Cosmos DB can be a very valuable choice. However, there are other options available in Azure and open source, depending on the application requirements and compliance. We encourage you to explore different alternatives and find the one that suits your needs best.

#MVPBuz #Azure #AzureCosmosDB #MongoDBvCore #VectorDatabase #RAG #NoSQL#AIApplications #DataManagement #LLM #GenerativeAI