Minimal RAG model

Using Cohere and SerpAPI

Jan 26, 2025

Retrieval-Augmented Generation (RAG) is a powerful approach that combines the strengths of information retrieval and text generation. In this minimal example, we'll demonstrate how to build a simple RAG pipeline using Cohere (link) for text generation and SerpAPI (link) for retrieving relevant information from the web. Both Cohere and SerpAPI offer free-tier access, making this example accessible to anyone interested in exploring RAG systems.

What is RAG?

RAG enhances language models by allowing them to retrieve relevant documents or data from an external source (like a search engine or database) and use that information to generate more accurate and contextually relevant responses. This approach is particularly useful for tasks like question answering, where up-to-date or domain-specific knowledge is required.

Tools Used in This Example:

1. Cohere: A state-of-the-art language model API for text generation. Cohere's free tier allows you to experiment with its capabilities, including generating text based on prompts.

2. SerpAPI: A search engine results API that retrieves real-time data from Google and other search engines. SerpAPI's free tier provides limited but sufficient access for small-scale projects.

Make sure you have an account with both Cohere and SerpAPI. It's free and easy (though there are some limitations to a free account). Once you're signed up, get the appropriate API keys from both services.

How This Example Works

1. Retrieval. SerpAPI is used to fetch relevant search results based on a user query.

2. Augmentation. The retrieved information is passed to Cohere's language model as context.

3. Generation. Cohere generates a response that incorporates the retrieved data, providing a more informed and accurate answer.

This example is designed to be simple and easy to follow, requiring only a few lines of code. By the end, you'll have a working RAG pipeline that leverages the power of Cohere and SerpAPI to generate intelligent, context-aware responses.

Let's get started!

Create a Python environment

We use conda, but any method of creating a Python environment will do. To create a Conda environment with the packages "cohere" and "requests", follow these steps. If you don't have Conda installed, you can download and install it from the Anaconda website or install Miniconda, a lightweight version of Anaconda.

Open your terminal (Linux/Mac) or Command Prompt (Windows). Use the following command to create a new Conda environment. Replace myenv with the name you want to give your environment:

conda create -n myenv pip

This command creates a new environment named myenv with Python 3.9 installed. You can specify a different Python version if needed. Activate the environment using the following command:

conda activate myenv

Now, install the cohere and requests packages using pip (which is included with Conda):

pip install cohere requests

Alternatively, if the packages are available via Conda, you can install them using:

conda install -c conda-forge cohere requests

However, cohere is typically installed via pip.

The code

Run the following code in the newly created environment. Replace with your actual API keys. Without valid keys, the code will not work. Feel free to change the question (query) as you like.

import cohere
import requests

# API keys - replace with your actual API keys
# Without valid keys, the code will not work
COHERE_API_KEY = "your Cohere API key"
SERPAPI_KEY = "your SerpAPI API key"

# Cohere initialization
co = cohere.Client(COHERE_API_KEY)

# retrieval function (using SerpAPI)
def retrieve_information(query):
    url = "https://serpapi.com/search"
    params = {
        "q": query,
        "api_key": SERPAPI_KEY,
        "engine": "google"
    }
    response = requests.get(url, params=params)
    if response.status_code == 200:
        results = response.json()
        # Extract the first 3 snippets from the search results
        snippets = [result.get("snippet", "") for result in results.get("organic_results", [])[:3]]
        return " ".join(snippets)
    else:
        return "No information found."

# Function for generation (using Cohere) 
def generate_response(query, context):
    prompt = f"Query: {query}\nContext: {context}\nAnswer:"
    response = co.generate(
        model="command",  # Use Cohere's command model
        prompt=prompt,
        max_tokens=500,  # Answer length limit 
        temperature=0.7  # Answer "creativity" level
    )
    return response.generations[0].text

# Principal RAG function
def rag_model(query):
    # Retrieval step
    context = retrieve_information(query)
    print(f"Context retrieved: {context}")
    # Generation step
    response = generate_response(query, context)
    return response

# Example of usage  
# Feel free to modify the query as you like.
query = "Who is the 47th president of United States?"
answer = rag_model(query)
print(f"Answer: {answer}")

How It Works

1. Retrieval Step. The retrieve_information function sends a query to SerpAPI, which performs a Google search. It extracts the first three snippets (short descriptions) from the search results and combines them into a single string (context).

2. Generation Step. The generate_response function uses Cohere's command model to generate a response based on the query and the retrieved context. The model is configured with a temperature of 0.7, which balances creativity and relevance.

3. RAG Model. The rag_model function orchestrates the retrieval and generation steps, returning a final response to the user's query.

4. Example Usage. The query `"Who is the 47th president of United States?"` is passed to the RAG model, which retrieves relevant information and generates an answer.

For brevity, the code assumes the SerpAPI request will always succeed (in practice, you should add error handling for cases where the API fails - e.g., network issues, invalid API key, or rate limits).

Conclusion

That’s it! You can consider the following improvements: (a) use a more advanced method to filter and rank search results, such as scoring snippets based on relevance to the query; (b) If no relevant information is found, provide a fallback response (e.g., "I couldn't find any information on that topic."); (c) allow users to provide feedback on the generated response to improve the model over time; (d) cache frequently asked queries to reduce API calls and improve response time; (e) improve your code with basic error handling.

Useful links

Cohere (link)

SerpAPI (link)

Code - GitHub repo (link)

Support this blog

m0nads

Discussion about this post