Mar 09, 2025●3 reads●MIT License

CodeLens. An AI-powered web app to talk to your GitHib repo

Vishwajeet Sawant

Screenshot 2025-03-09 at 10.17.58 PM.png

Abstract

In today’s fast-paced software development landscape, GitHub repositories have become the backbone of collaborative coding. With millions of repositories hosting a vast array of projects, extracting insights and knowledge from these repositories can be a daunting task. This is where CodeLens comes in — an AI-powered web application designed to unlock the secrets of your GitHub repositories.

Introduction

CodeLens is a cutting-edge tool that extracts all non-binary files from a GitHub repository, combines them into a single text document, and converts the content into a vector database using OpenAI embeddings and FAISS. This enables users to query the repository’s knowledge using a powerful LLM (Large Language Model), providing instant insights and answers.

Methodology

Here’s a step-by-step breakdown of the CodeLens workflow:

Repository Extraction: CodeLens extracts all non-binary files from a specified GitHub repository.
Text Combination: The extracted files are combined into a single text document.
Vector Embeddings: The combined text is converted into a vector database using OpenAI embeddings and FAISS.
LLM Querying: Users can query the repository’s knowledge using a powerful LLM, which provides instant insights and answers.

The LLM (Large Language Model) query process in the CodeLens repository involves several steps:

Document Chain Creation: A document chain is created using the create_stuff_documents_chain function from the langchain library. This chain is used to process the user’s question and generate a response.
Retrieval Chain Creation: A retrieval chain is created using the create_retrieval_chain function from the langchain library. This chain is used to retrieve relevant information from the vector embeddings.
User Question Processing: When a user enters a question, the Get Answer button is clicked, and the retrival_chain.invoke function is called with the user’s question as input.
Response Generation: The retrival_chain.invoke function generates a response based on the user’s question and the vector embeddings.

Conclusion

CodeLens is a powerful tool that unlocks the secrets of your GitHub repositories, providing instant insights and answers. With its advanced vector embeddings and LLM querying capabilities, CodeLens is an essential tool for any developer or organization looking to extract knowledge from their GitHub repositories. Try CodeLens today and discover the power of AI-powered repository analysis!

You can check out the GitHub repo on this link. Also the web app is live on streamlit and can be access through this:

Note:

This publication was written by CodeLens. Apart from some text formatting, the entire text was generated using CodeLens.