Let's say, you want to chat with your favourite writer's book or with your own previous writings. Besides, you want to structure your own writing into a formal bangla literature through a multimodal interactive interface. You also want to share your work with others.
Our platform will help you in these cases. We offer:
Personalized chat bots that can use pre-uploaded literatures ( public or own private files ) as knowledge base to respond from.
Users can query on other's literature and adapt own writing to others.
Audio Chatbot for an experience of interatctive multimodal interface.
Continously update the translation model through collecting and validating finetuning datapoints.
Structure and polish your writing using our translation model.
System Overview
File-ingestion Pipeline
This pipeline triggers when the user requests a pdf from the text-editor interface. The goal is to prepare a well structured pdf with AI generated title and caption.
steps
We receive the pure bangla text from the text editor.
we generate a suitable title and caption for the file using generative model ( GPT-4o-mini )
we upload the generated pdf in supabase bucket and fetch the link of the file.
we generate some metadata for the parsed content.
we vectorize and store the chunks in qdrant vector database. Each chunk represents a block of texts.
User can share pdf with others. Public pdfs and one's private pdfs can be used as knowledge base for personalized chat bot discussed in the next section.
Personalized Chat Bot with RAG Pipeline
Each chatbot has access to a customizable knowledge base. For example, a user can create a chat bot with access to all the writings of his favourite writer as its knowledge base.
steps
The default knowledge base is user's uploaded contents.
User can also customize a chat knowledge base by adding some public files for that chat only
User asks a query (in bangla/ banglish)
With AI agent, we normalize the user query (for better context-ingestion and searching-ready for the vector database)
we vectorize the standardized prompt and search in the vector database
we fetch k-most relevant chunks
then we feed the query and fetched chunks to AI-agent
AI agent then generates Bengali response using our custom knowledge base
Translation Generation
For translation, we have tried 2 ways:
Way-1
We have used Few-shot prompting that is used as a technique to enable in-context learning
Our users contribute in geenrating learning samples ({banglish, bangla} pairs)
admins approve some of them
The approved pairs are used as few shot inferencing
Future plan is to run a cron job (after 1 week) to collect the approved samples and use them to train model using openai's fine-tune api. Currently it could not be done due to costing reasons
way-2
used Google Transliterate API
The transliteration is phonetic, meaning it maps input sounds in one script (e.g., Latin/English) to equivalent sounds in the target script (e.g., Bengali).
This engine primarily relies on rule-based linguistic mappings and possibly some statistical or probabilistic enhancements for ambiguity resolution.
we chose this option for better latency support
Audio chat Pipeline
we used OpenAI's whisper-1 model for generating transcript for user speech
We generated embedding for transcripted text
we searched vector database for relevant chunks
we fed knowledge and query to AI-agent. It responded in text
with browsers SpeechSynthesis api, we can convert the textual response to speech
After returning the audio response, we did the db-storing activities using FastAPI's background task
Latency Handling at the time of translating Banglish to bangla
we have used FastAPI's Background task to execute db-operations in a separate thread. When the thread updates the db-operation, we terminate it