See on GitHub
In recent years, AI-powered chat assistants have become increasingly prevalent, but most solutions are controlled by companies that impose paywalls, availability limitations, and usability constraints. Cloud-based AI models are often unreliable due to server downtime, while company-provided interfaces lack flexibility, do not support multi-user interactions, and are frequently designed with artificial limitations to push users toward paid tiers. Additionally, some companies intentionally degrade model performance before releases for marketing purposes, creating an unnecessary cycle of artificial scarcity.
To address these issues, tg-local-llm
provides a fully independent, self-hosted AI assistant that runs entirely on local hardware, ensuring maximum privacy and eliminating reliance on external providers. Built with a lightweight and easily modifiable stack using Deno and TypeScript, it allows users to customize and extend functionality with minimal programming experience—or none at all. The bot integrates seamlessly into Telegram Messenger, making it a handy, real-life companion for everyday tasks. Unlike traditional AI UIs, this solution supports long, continuous conversations, multi-user interactions, and a modular toolset that allows for easy integration of additional capabilities.
By giving users full control over their AI assistant, tg-local-llm ensures unrestricted access, adaptability, and long-term reliability, making it an ideal choice for those who prioritize privacy, customization, and independence in their AI interactions.
<message_start> [any_character] <message_end>
. Given that grammars are not lazy, when model will generate <message_end>
it will be treated as a part of [any_character]
, so it won't be required to stop. Given the confusion between grammar requirement and model thinking that it already finished, it will always produce an insane amount of semi-random text. The simple solution is to pick some barely used character, such as σ̌
, and use it as a wrapper for section tags. Then, I replace [any_character]
with [any_character except σ̌]
. This way, whenever the model is writing σ̌
it will be handled as a part of a required section tag since it can't belong to "any character" part. Later I changed it to ≪
(much less than) and ≫
(much greater than) to not to introduce another language in responses which can make model switch it for no reason.search_web
and read_article
. First one uses locally running SearXNG to retrieve a list of relevant links given query
. As a response, model receives a bullet list of source_url
, title
. Second one uses headless browser to evaluate document.body.innerText
essentially extracting all text from the web page. Result is passed to a separate LLM call (summarizer) with a request to summarize contents and remove metadata, summary is then given to main (chat context aware) model to respond. The tool is usually used after search_web
or when users ask to read a specific URL. Bonus point: to avoid (rather minimise) robot checks and rejections on websites, I add custom User-Agent and some headers - it works much better (see src/services/browser.ts
).category: text|image
to the search_web
tool. This allows models to search images. Additional image
section is used by model to provide a direct image URL which is then used by client. Also, I updated structure so that model will write tool_call
section before the message
section. Meaning, model can now describe what is it doing with tools and client can show this to user before it gets tool response and actual response from the model.thoughts
section before the message
section. To make it meaningful, model is required to include User Response
, Reasoning
, and Next Steps
sections within thoughts tag.tool_guide
section after tool_response
with instructions on what to do with a specific tool response. For example, with text search, guide section will require model to select a source and use get_text_content
tool read it. For image search, guide will prohibit extracting text and will require to provide one of the images in the response./ai remember
to add a note. Next step on this one and a great improvement could be to implement a simple memory
tool so that model can read and write memory notes itself./ai extremely lazy
to make model behave like a lazy person: internally, it will inject an instruction to behave this way and disable some conflicting instructions. Given this, I recommend to generate system prompt every time rather than storing it as a message in the database.//
for hidden repliesContainerfile
was added for containerization. As a future improvement, there will be containers for database and backend with a single script to manage itThere are no models linked
There are no datasets linked
There are no datasets linked
There are no models linked