RAG Research Assistant

Project Goal:

This work seeks to augment a LLM to assist with learning the salient points in research papers. Reading research papers can be time intensive, so using a chatbot that can sumarize and respond to questions about a paper can help to reduce the time necessary to keep up to date on AI research developments.

Approach:

I utilized a RAG system to feed context from research papers to GPT 4.0 -

Collected and preprocessed research papers by chunking their content into manageable text segments
These chunks were then embedded using OpenAI embeddings to capture semantic meaning
The resulting vectors are stored in a vector database such as Chroma for efficient similarity search
At query time, these retrieved texts are then passed as context to GPT-4 via LangChain, enabling the model to generate informed and grounded answers based on the research papers
The sources of chunks are stored so that the article and page number of the retrieved information can be cited

Example Results:

Papers Uploaded: Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach (Zhouwan et al. 2024) and Searching for Best Practices in Retrieval-Augmented Generation (Wang et al. 2024)

Query: "How does multimodal retrieval enhance the capabilities of RAG?"
- Response: Multimodal retrieval enhances the capabilities of RAG by incorporating text2image and image2text retrieval, allowing for more efficient and accurate handling of visual inputs and multimodal content generation. This approach ensures groundedness by providing information from verified multimodal materials, reducing the risk of inaccuracies compared to on-the-fly generation. Additionally, it improves efficiency by leveraging existing stored materials, speeding up processes like image generation when relevant descriptions are available. Sources: wang_paper.pdf
Query: "What were the key failure cases for RAG in handling long context retrievals, as noted by Zhuowan Li?"
- The key failure cases for RAG in handling long context retrievals, as noted by Zhuowan Li, include its performance drop when queries are slightly modified or when dealing with complex queries such as determining which of two passkeys is larger. These cases highlight RAG's limitations in adapting to variations in query phrasing and handling more intricate retrieval tasks. Sources: zhuowan_paper.pdf
Query: "How does query rewriting contribute to the overall efficiency of RAG according to Wang's findings?"
- Query rewriting contributes to the overall efficiency of RAG by allowing the system to dynamically route queries based on model self-reflection, effectively combining the strengths of RAG and other methods like LC. This approach achieves comparable performance to more computationally intensive methods at a significantly reduced cost. By optimizing the retrieval process, query rewriting helps balance performance and efficiency, making RAG a viable option for handling inputs that exceed the model's context window size. Sources: zhuowan_paper.pdf, wang_paper.pdf

Highlighted Code Sections:


import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# Load in papers from folder
docs = []
for file in os.listdir(pdf_folder_path):
    if file.endswith('.pdf'):
        pdf_path = os.path.join(pdf_folder_path, file)
        loader = PyPDFLoader(pdf_path)
        loaded_docs = loader.load()
        for doc in loaded_docs:
          filename = os.path.basename(file)
          print(filename)
          doc.metadata["source"] = filename
        docs.extend(loaded_docs)


# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()


# RETRIEVAL and GENERATION
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

from langchain.schema.runnable import RunnableMap, RunnableLambda, RunnablePassthrough

def format_docs_with_metadata(docs):
    return {
        "context": "\n\n".join(doc.page_content for doc in docs),
        "source_docs": docs
    }

#Build the chain to keep sources
rag_chain = (
    RunnableMap({
        "context": retriever | RunnableLambda(format_docs_with_metadata),
        "question": RunnablePassthrough()
    })
    | RunnableLambda(lambda inputs: {
        "question": inputs["question"],
        "context": inputs["context"]["context"],
        "source_docs": inputs["context"]["source_docs"]
    })
    | RunnableMap({
        "response": prompt | llm | StrOutputParser(),
        "source_docs": lambda x: x["source_docs"]
    })
)

#formatt result so it prints the source at the end
def format_response(result):
  response = result["response"]
  sources = {doc.metadata.get("source", "Unknown") for doc in result["source_docs"]}
  response_with_sources = f"{response}\n\nSources: {', '.join(sources)}"
  return response_with_sources