š https://docu-talk.ai-apps.cloud
Docu Talk is an AI-powered platform that allows you to create custom chatbots based on your own documents.
Demo video:
https://github.com/user-attachments/assets/6a03b0a1-a549-4e58-9576-2ee25e0b6ba1
The application relies on a database composed of MongoDB and Cloud Storage.
MongoDB contains the core database of the application, such as access data, created chatbots, and consumed usage (see section MongoDB Database). The connection between the back-end and the MongoDB database is established through a VPC/NAT configuration, allowing the container to communicate externally with a static IP whitelisted by MongoDB Atlas, thus preserving the security of the internal network.
Cloud Storage is used to store large files, i.e PDFs uploaded by users. Cloud Storage returns signed and secure URLs that enable users to access their documents through the application. Finally, Cloud Storage integrates with Gemini by providing only the URIs of the documents without needing to read them from the back-end.
The back-end and front-end are deployed in a single container, using Python as the programming language.
The back-end uses Gemini as a generation model that directly interacts with the URIs of the uploaded documents. An Amazon Web Services SES service is also deployed to handle email sending to users.
The front-end is built using the Streamlit framework.
The application is hosted on Cloud Run or Streamlit Cloud and mapped to the domain docu-talk.ai-apps.cloud.
3 authentication methods are available in the application:
The MongoDB database contains the majority of the data stored by the application. It is composed of 10 tables that facilitate the management of user access, chatbots and their documents, and consumed usage.
bcrypt
.access
field indicates whether the chatbot is public or private.The AskChatbotTokenCounts, AskChatbotDurations, and CreateChatbotDurations tables are used to log various metrics. These metrics are frequently used to retrain Machine Learning models to estimate waiting times or credits consumed before executing different processes.
There are no datasets linked
There are no datasets linked