Converse with large language models using speech. DEMO
For text generation, you can either self-host an LLM using Ollama, or opt for a third-party provider. This can be configured using a .env file in the project root.
If you're using Ollama, add the OLLAMA_MODEL
variable to the .env file to specify the model you'd like to use. (Example: OLLAMA_MODEL=deepseek-r1:7b
)
Among the third-party providers, Sage supports the following out of the box:
To use a provider, add a <PROVIDER>_API_KEY
variable to the .env file. (Example: OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxx
)
To choose which model should be used for a given provider, use the <PROVIDER>_MODEL
variable. (Example: DEEPSEEK_MODEL=deepseek-chat
)
Next, you have two choices: Run Sage as a Docker container (the easy way) or natively (the hard way). Note that running it with Docker may have a performance penalty (Inference with whisper is 4-5x slower compared to native).
With Docker: Install Docker and start the daemon. Download the following files and place them inside a models
directory at the project root.
Run bun docker-build
to build the image and then bun docker-run
to spin a container. The UI is exposed at http://localhost:3000
.
Without Docker: Install Bun, Rust, OpenSSL, LLVM, Clang, and CMake. Make sure all of these are accessible via $PATH
. Then, run setup-unix.sh
or setup-win.bat
depending on your platform. This will download the required model weights and compile the binaries needed for Sage. Once finished, start the project with bun start
. The first run on macOS is slow (~20 minutes on M1 Pro), since the ANE service compiles the Whisper CoreML model to a device-specific format. Next runs are faster.
There are no datasets linked
There are no datasets linked