Enhancing Retrieval-Augmented Generation (RAG) with Python

2 min readSep 1, 2024

Photo by Alexei Maridashvili on Unsplash

In any LLM model development, RAG is like a smart helper that reads lots of books to find the best answers for you. It first picks the most important parts from the books and then uses its brain (an AI model) to give you the answer you need. It’s like having a super-powered librarian who always knows where to look!

However, the base RAG always needs finetuning to improve the performance.

Here are some effective techniques that can take your model to the next level. I’ve included Python snippets for each as a template reference!

1️⃣ Base Case RAG:
Perform Top K retrieval on embedded document chunks, and return those chunks for the LLM’s context window.

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
vectorstore = FAISS.load_local('faiss_index', OpenAIEmbeddings())
results = vectorstore.similarity_search("Your query", k=5)

2️⃣ Summary Embedding:
Retrieve Top K on embedded document summaries but return the full document for the LLM context window.

from langchain.retrievers import MultiVectorRetriever
retriever = MultiVectorRetriever.from_texts(["doc1 summary", "doc2 summary"], embedding_model)
results = retriever.retrieve("Your query")
full_doc = retriever.get_full_documents(results)

3️⃣ Windowing:
Retrieve Top K on embedded chunks or sentences, but return an expanded window or full document.

from langchain.retrievers import ParentDocumentRetriever
retriever = ParentDocumentRetriever(vectorstore=vectorstore)
results = retriever.retrieve("Your query")
expanded_results = retriever.expand_results(results)

4️⃣ Metadata Filtering:
Perform Top K retrieval with chunks filtered by metadata.

from langchain.retrievers import SelfQueryRetriever
retriever = SelfQueryRetriever(vectorstore=vectorstore)
results = retriever.retrieve("Your query", filters={"author": "John Doe"})

5️⃣ Fine-Tune RAG Embeddings:
Fine-tune the embedding model on your specific data.

from langchain.embeddings import FineTunableOpenAIEmbeddings
embeddings = FineTunableOpenAIEmbeddings().fine_tune(your_data)

6️⃣ 2-Stage RAG:
Start with a keyword search, followed by semantic Top K retrieval.

from cohere_rerank import CohereReranker
initial_results = keyword_search("Your query")
reranker = CohereReranker()
final_results = reranker.rerank(initial_results)

Let me know your thoughts and suggestions in the comments based on what worked for you.🔍💡

#GenAI #LLMs #RAG #LangChain

Enhancing Retrieval-Augmented Generation (RAG) with Python

Written by Kovendhan Venugopal