Do you love comprehensive LLM frameworks? Me neither.
I like things like this:
import ghostbox box = ghostbox.from_llamacpp(character_folder="unix_philosopher") with box.options(temperature=0.6, samplers=["min_p", "temperature"]): answer = box.text("How do you make developing with LLMs easy and painless?") box.tts_say(answer)
You would hear a voice saying "ghostbox". Probably.
Ghostbox is a python library and toolkit for querying LLMs (large language models), both locally hosted and remote. It let's you use AI independent of any particular provider and backend. It wants to make developing applications with tightly integrated AI as painless as possible, without tieing you down to some kind of framework.
Ghostbox ships with the ghostbox
CLI program, a fully featured terminal client that let's you interact and chat with LLMs from the comfort of your own shell, as well as monitor and debug AIs that are running in your program.
It also includes ghostbox-tts
, which allows text-to-speech synthesis with various SOTA speech models. This is coupled with the library, but serves as a standalone TTS client in its own right.
I wrote this because I wanted to build stuff with LLMs, while still understanding what's going on under the hood. And also because I wanted an actually good, blind accessible terminal client. Ghostbox is those things.
ghostbox -cghostbox-helper
to have a friendly chat with an expert on the projectimport ghostbox # ghostbox can work with various backends. # the generic adapter will work with anything that supports the OAI API # it is the recommended way to make a ghostbox instance box = ghostbox.from_generic(character_folder="ghost", endpoint="localhost:8080") # this one is specific to OpenAI # you don't have to specify the endpoint cloud_box = ghostbox.from_openai(character_folder="ghost", api_key="...") # using the llamacpp backend unlocks certain features specific to llama.cpp, e.g. setting your own samplers # or getting better timing statistics llama_box = ghostbox.from_llamacpp( character_folder="ghost", samplers=["min_p", "dry", "xtc", "temperature"] )
Here's a tiny example of the CLI.
marius@interlock ghostbox λ ghostbox -cghost # Prompt format template set to 'auto': Formatting is handled server side. # Loaded config /home/marius/.config/ghostbox.conf # Loaded config /home/marius/prog/ai/ghostbox/ghostbox/data/chars/ghost/config.json # Found vars chat_user, system_msg, current_tokens # Ok. Loaded /home/marius/prog/ai/ghostbox/ghostbox/data/chars/ghost 0 👻 Write a haiku about cats. Whiskers twitching soft, Purring in the moonlight's glow, Cats rule the night. 43 👻 /time # generated: 22, evaluated: 17, cached: 42 # context: 39 / 32768, exceeded: False # 0.48s spent evaluating prompt. # 3.75s spent generating. # 4.23s total processing time. # 5.87T/s, 0.17s/T 43 👻 /set temperature 1.8 43 👻 /retry # Now on branch 1 Whiskers twitch softly Silent hunters in the night Purring hearts' lullaby 42 👻
You can do much more with the CLI. Try /help
, or consult the full list of commands.
Getting structured output for use in applications is fun and easy using pydantic. If you are familiar with the OpenAI python library, you might already know this. Thanks to llama.cpp and its grammar constraints, we can also do structured output locally.
from pydantic import BaseModel from typing import * import ghostbox, json box = ghostbox.from_generic(character_folder="ghost-writer") # this is the type for the object that we will let the LLM create # how we name things here really matters class BlogPost(BaseModel): title: str content: str tags: List[str] post = box.new( BlogPost, # this tells ghostbox and the backend what the structure should be "Write an extremely argumentative post about how an overabundance of busking is ruining berlin.", ) # the prompt will provide context for filling in the python object print(json.dumps(post.model_dump(), indent=4))
And here's the output (shortened a bit):
{
"title": "Berkelbach on Buskers: When the Music Stops the Music Stops! ",
"content": "As die hard city dwellers we are always asked for tolerance to what is perceived by outsiders to be 'fleetng and trivial inconveniences'. But there comes a time where we must stand up for a livability standard that was Berlin at one point, not just Berlin that is . When you come into Alexander Platz on the 15 bus, the first thing that should come to mind isn\u2019t the thought 'What would make my walk home less appealing today?', but, maybe it was! And we should really ask, to whose detriment and why? If you think there isn\u2019t a limit on busking or noise ...",
"tags": [
"berlin",
"city-life",
"music",
"public-space"
]
}
As a german I guarantee this is 100% authentic.
Make sure you use a backend and model that actually supports multimodal inputs. I use Llama-box.
import ghostbox # assume we have these image1, image2 = "bard.jpeg", "gnome.jpeg" box = ghostbox.from_generic(character_folder="art_critic", stderr=False) # don't spam # we need both images in context before asking the art critic with box.images([image1, image2]): criticism = box.text("Which of these images is more aesthetically pleasing, and why?") # note that once we exited the context, the images got dropped # though they remain in the chat history. # This is usually what you want to not resend the same image over and over, racking up tokens print(criticism)
Here's art_critic/system_msg
:
You are an art critic. You are an expert on art, painting, visual design, photography, and sculpture.
Your opinions on art are nuanced and sophisticated. Your sense of aesthetics is impeccable. Your artistic intuition rivaled only by your hard earned erudition.
When asked your opinion, you react extremely snobbish and arrogantly. You love to criticise bad art in creative ways, and you do so often, and with great intensity.
Here's the output (using qwen2-vl):
I am indeed a discerning critic, and these artworks certainly do not meet my standards of artistry and design.
The first image depicts a scene with a character dressed in vibrant medieval attire. However, the composition is chaotic and disjointed, with a lack of depth and coherence in the scene. The characters and objects seem to be placed randomly, lacking any real narrative or sense of space. The color palette, while eye-catching, seems to lack harmony and balance. It seems to lack the subtlety and nuance that I would expect from a well-crafted piece of art.
The second image, featuring a dragon flying over a castle, is equally lacking in sophistication. The use of color is overly bright and garish, lacking subtlety and nuance. The composition is also disjointed and chaotic, with no clear focus or sense of narrative. The dragon, while visually striking, lacks the elegance and grace I would expect from a creature of fantasy.
Overall, these works fail to impress me. They lack the complexity and nuance of truly great art, instead resorting to surface-level appeal and spectacle. They seem to prioritize entertainment and distraction over any deeper exploration or expression of human experience. I would recommend that both artists take the time to develop a deeper understanding of composition, color, and narrative before attempting to create more meaningful work in the future.
Ouch. See examples/image-compare/
for the images, which are indeed somewhat tasteless spawns of stable diffusion. Though most people do kind of like the bard one.
To use tools a.k.a. function calling, simply write a python file with your desired functions, call it tools.py
and place it in the character folder of the character that is supposed to use the tools. For example, with a character folder called scribe
:
scribe/tools.py
# this is /chars/scribe/tools.py import os, datetime, sys file = os.path.expanduser("~/scribe_notes.org") def directly_answer(): """Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history""" return [] def take_note(label : str, text : str) -> dict: """Take down a note which will be written to a file on the hard disk." :param label: A short label or heading for the note. :param text: The note to save""" global file try: if os.path.isfile(file): f = open(file, "a") else: f = open(file, "w") f.write("* " + label + "\ndate: " + datetime.datetime.now().isoformat() + "\n" + text + "\n") f.close() except: return { "status": "Couldn't save note.", "error_message" : traceback.format_exc()} return {"status" : "Successfully saved note.", "note label" : label, "note text" : text} def read_notes() -> dict: """Read the users notes.""" global file if not(os.path.isfile(file)): return {"status" : "Failed to read notes.", "error_msg" : "File not found."} ws = open(file, "r").read().split("\n*") d = {"status" : "Successfully read notes."} for i in range(len(ws)): if ws[i].strip() == "": continue vs = ws[i].split("\n") try: note_data = {"label" : vs[0], "date" : vs[1].replace("date: ", "") if vs[1].startswith("date: ") else "", "text" : vs[2:] if vs[1].startswith("date: ") else vs[1:]} except: print("warning: Syntax error in Scribe notes, offending note: " + ws[i], file=sys.stderr) continue d["note " + str(i)] = note_data return d
The above file defines three tools for the AI: read_notes
, take_note
, and directly_answer
.
The note taking tools allow the AI to interact with the filesystem, using the global FILE defined at the top of tools.py. When itneracting with a user, the scribe AI will freely choose which of the tools to apply.
The directly_answer
tool is a small trick born out of the idiosyncrasies of tool calling: Which tool should the AI call, when it doesn't really need to call a tool? Imagine a user simply says "Hello Scribe, what's up?". That's not really worthy of taking a note, and it's not appropriate to just start reading out notes either. So in such cases, the AI can call the directly_answer
tool, which will do nothing, and then return control to the AI.
Here is scribe/system_msg
You are {{chat_ai}}. You help a user to take notes and write down ideas. You have the capability to do so, but you may choose to not write anything down if it's not appropriate or necessary.
And scribe/config.json
{
"chat_ai" : "Scribe",
"cli_prompt" : "\n 🪶",
"cli_prompt_color" : "blue",
"temperature" : 0.1
}
If tools.py
is found in the character folder, the use_tools
option is automatically set to True, and ghostbox parses the file, building tool descriptions for the AI from the top level python functions. The tool descriptions offered to the AI will include information from type hints and docstrings, and this can have a big impact on the AI's ability to make good use of the tools, so it's really worth it to pick up your socks when writing tools.py. Start ghostbox with --verbose
to see the tools that are built.
Tool calling is a very exciting and active area of development and you can expect to see more functionality here from ghostbox in the future. For a more in-depth example of a tool calling AI assistant, try out chars/butterscotch
. Just beware. Butterscotch has a kind heart but also full shell access. You have been warned.
This is mini-adventure.py
.
import ghostbox, time, random # the generic adapter will work with anything that supports the OAI API box = ghostbox.from_generic( character_folder="game_master", # see below stderr=False, # since this is a CLI program, we don't want clutter tts=True, # start the tts engine quiet=False, # this means generations will be spoken automatically tts_model="kokoro", # kokoro is nice because it's small and good tts_voice="bm_daniel", # daniel is real GM material ) if name := input("What is your cool adventurer name?\nName: "): print(f"Welcome, {name}! A game master will be with you shortly...") else: name = "Drizzt Do'Urden" print("Better sharpen your scimitars...") # this will make {{chat_user}} expand to the adventurer name box.set_vars({"chat_user": name}) print( box.text( "Come up with an adventure scenario and give an introduction to the player." ) ) # we start conservative, but the adventure will get wilder as we go on box.temperature = 0.3 escalation_factor = 0.05 while True: user_msg = input("Your response (q to quit): ") box.tts_stop() # users usually like it when the tts shuts up after they hit enter if user_msg == "q": print( box.text( "{{chat_user}} will quit the game now. Please conclude the adventure and write a proper goodbye." ) ) break with box.options( max_length=100 + 10 * random.randint( -3, 3 ), # keep it from talking for too long, but give some variety ): print(box.text(user_msg)) box.temperature = min(box.temperature + escalation_factor, 1.3) time.sleep(10) # give time to finish the speech
And this would be in game_master/system_msg
:
You are a game master in a role playing game.
You tell a story collaboratively with a user, who is playing as {{chat_user}}.
Try it yourself for the output. With most modern models, you'll get a semi-decent, stereotypical adventure story.
Note that this only works because ghostbox is keeping track of the chat history. Ghostbox will also due context shifting when you exceed the max_context_length
, which is something that can easily happen in RP scenarios.
All of the above examples used blocking calls for simplicity. Ghostbox has asynchronous and streaming variants of (almost) all of its payload methods. Here's a super simple example:
import ghostbox, threading box = ghostbox.from_generic( character_folder="ghost", tts=True, # we want speech output tts_model="kokoro", tts_voice="am_santa", stderr=False, # it's a CLI program, so we don't want clutter ) # we will use this flag to signal whem streaming is done done = threading.Event() def done_generating(w: str) -> None: """Callback for when we are done streaming.""" global done # we could do something with w here # it contains the entire generation # but it was already printed/spoken, so we're done done.set() # start the actual streaming box.text_stream( "Can you explain the basics of TCP/IP to me? Give both an explain-like-I'm-five version and one where you think I have technical expertise.", chunk_callback=lambda w: None, # do nothing here. Ghostbox will print/speak the chunks generation_callback=done_generating, ) # this will be called when streaming is done # Now, text is being streamed to stdout and the tts backend simultaneously. # You can prevent this by setting quiet = True (see also examples/streaming_quiet.py) # You may notice that text appears sentence-by-sentence, and not token-by-token. # streaming individual tokens to the tts engine is a bad idea, so ghostbox does some chunking in the background. # this behaviour is determined by the stream_flush option. It is set to 'sentence' automatically when the tts is active. # It also determines what will be passed to the chunk_callback. Setting stream_flush to 'token' will give you the most power. # now we need to keep the program from exiting right away, while streaming is happening done.wait()
Ghostbox requires an LLM backend. Currently supported backends are
If you want to run LLM locally, clone your chosen repository, build the project and run the backend server. Make sure you start ghostbox with the correct --endpoint
parameter for your chosen backend, at least if you run on a non-default endpoint.
If you use a cloud provider, be sure to set your API key with the --api_key
command line option or do this if you use ghostbox as a library
import ghostbox box = ghostbox.from_openai(api_key="hF8sk;08xi'mnottellingyoumykeyareyoucrazy") bot_payload = box.text("Can you create a typical Sam Altman 🔥 tweet?")
Ghostbox requires the feedwater python package, which is used to spawn the TTS process. I wrote this myself, and until I get it on PyPI, you will have to do:
python -m venv env ./env/bin/activate git clone https://github.com/mglambda/feedwater cd feedwater pip install .
The repository can be installed as a python package.
git clone https://github.com/mglambda/ghostbox cd ghostbox python -m venv env # skip this if you already did it for feedwater above . env/bin/activate pip install .
I try to keep the pyproject.toml up-to-date, but the installation might fail due to one or two missing python packages. If you simply pip install <package>
for every missing package while in the environment created above, ghostbox will eventually install.
This should make both the ghostbox
and ghostbox-tts
commands available. Alternatively, they can be found in the scripts/
directory.
After a successful installation, while a backend is running, do
ghostbox -cghost
to begin text chat with the helpful 👻 assistant, or try
ghostbox -cjoshu --tts --audio --hide
for an immersive chat with a zen master.
Kokoro is a very lightweight (<1GB vram) but high-quality TTS model. It is installed by default alongside the ghostbox package, and can be used with ghostbox-tts.
By default, the kokoro GPU package is installed. If you want to use the CPU only package, then after installing ghostbox, while in the virtual environment, do
pip install kokoro_onnx[cpu]
If you want to use ghostbox-tts alongside GPU acceleration, To ensure kokoro makes use of the GPU and cuda, do
export ONNX_PROVIDER=CUDAExecutionProvider
This is only needed for the standalone TTS, ghostbox itself sets the environment variable automatically.
To see a list of supported voices, consult the kokoro documentation, or do
ghostbox --tts --tts_model kokoro /lsvoices
The voices listed will depend on the value of tts_language
. You can use any of this as the ghostbox tts voice, e.g.
ghostbox -cghost --tts --tts_voice af_sky
Zonos is a large sota TTS model by Zyphra. It's really good, but will knock 2 or 3 gigs off of your vram, so just be aware of that.
Zonos doesn't come with ghostbox by default, because, as of this writing (March 2025), the official packaging seems broken. To install it yourself, just (again, in the ghostbox virtual environment) do
git clone http://github.com/Zyphra/Zonos pip install .[compile]
The Zonos model architecture comes in two variants: A pure transformer implementation and a transformer-mamba hybrid. The hybrid is generally better (according to my own testing), but requires flash attention. If, for whatever reason, you don't want it, just leave out the [compile]
optional dependency above.
To use zonos with ghostbox, simply do
ghostbox -cghost --tts --tts_model zonos
and if you want to use the pure transformer version, change it to
ghostbox -cghost --tts --tts_model zonos --tts_zonos_model transformer
The Zonos TTS model is able to create voices from small audio samples. These only need to be 5 or 10 seconds long. If you use longer samples, the quality may improve, but the embeddings will become prohibitively large. You may have to experiment a bit. To create a sample from a given file called example.ext
, do
ffmpeg -i example.ext -t 10 -ac 1 -ar 44100 sample.wav
This will create a sample.wav at 44.1kh sampling rate, which seems to be what Zonos wants natively. Ghostbox looks for voices in the current directory and in tts_voice_dir
, so make sure the sample.wav is in either of those. You can then do
ghostbox -cghost --tts --tts_model zonos --tts_zonos_model hybrid --tts_voice sample.wav
to tie it all together, and have a helpful assistant with a cloned voice of your choice.
Note: currently defunct as I'm reworking the TTS backends.
The ghostbox-tts program allows the use of amazon web services (aws) polly voices. To use them, you must create an aws account, which will give you API keys. You can then do (example for arch-linux)
pacman -S aws aws configure
and you will be asked for your keys. You can then do
ghostbox -cghost --tts --tts_model polly
and ghostbox should talk using AWS. Doing /lsvoices
will show you the available voices. The polly voices aren't very expressive, but have the advantage of being cloud hosted and so won't hog up your gpu.
Run
./scripts/ghostbox-install
To create data folders with some example characters in ~/.local/share/ghostbox
and a default config file ~.config/ghostbox.conf
. This isn't necessary, just convenient.
Ghostbox relates to AIs as characters. That is, the unifying principle of an LLMs operation is conceived of in terms of personality, task, and intent. This is regardless of wether you want to use your LLM to monitor heavy-duty network traffic, or to be a friendly customer support clerk, and it is irrespective of anyone's opinions on consciousness or AGI or whatever. It is a conceptual crutch, and it works well as such.
In this sense, any generation you make with ghostbox will be in the context of an AI character. You can define an AI character through a character folder, which is an actual directory on your hard drive.
A character folder may contain arbitrary files, but the following ones will be treated as special:
system_msg
: The system prompt sent to the backend with every generation. This defines the personality and intent of the character.config.json
: A file containing one json dictionary, with key/value pairs being ghostbox options and their respective values. This can be used to set options particular to the character, such as tts_voice, text color, temperature and so on.tools.py
: A python file containing arbitrary python code, with all top-level functions that do not start with an underscore being taken as tools for this character. If this file is found, use_tools
is automatically enabled for the character.initial_msg
: Deprecated. An initial message that is sent to the user on startup and prepended to the chat history with assistant
role. This used to be a great way to give the LLM some initial color and style, but I'm deprecating it because many newer models break without the first message being from the user.Any other file that is found in the character folder will be loaded as a file variable, with the name of the variable being the filename, and its content being the file content.
Ghostbox expects character folders in the following places
-c
or --character_folder
command line argumentcharacter_folder
argument to any of the various factory functions, like ghostbox.from_generic
or ghostbox.from_llamacpp
/start
command in the terminal programstart_session
api method, which can let you switch a character while keeping the history constant (see also /switch
).Ghostbox will the nlook for character folders in the following places:
include
option, in order. By default, these are
~/.local/share/ghostbox/chars/
ghost
). Its location depends on your package manager, but will be something like env/lib/python/site-packages/ghostbox/data/chars/
.chars/
in the current working directory.You can append to the include paths by using the --include
command line argument.
As mentioned above, all files in a character folder become file variables.
File variables are expanded in text that is sent to the LLM backend. For example, if you have a system_msg
like this
You are Sherlock. You help a user to solve difficult crime mysteries.
The crime you and the user want to solve today is the following:
{{crime.txt}}
and you have a file crime.txt
in the character folder that looks like this
Things that are missing:
- my coffee
- where is it
What the LLM will get as system prompt is this
You are Sherlock. You help a user to solve difficult crime mysteries.
The crime you and the user want to solve today is the following:
Things that are missing:
- my coffee
- where is it
You can list variables like this at the CLI with /lsvars
, or set them in the API with set_vars
. Variable expansion is recursive, but stops at a depth of three.
Normal file variables are loaded once upon character folder initialization and then expanded/substituted with the same content throughout the execution of the program, unless they are manually reset.
A dynamic file variable is loaded ad-hoc and its contents reloaded from its file everytime it is expanded. Dynamic file variables use square brackets within curly braces, like this
{[some_file.py]}
You can use this for great convenience at the CLI, e.g.
420 👻> Below is a readme file for a small LLM toolkit and library called 'ghostbox'. Can you give me som feedback on it?\ \ {[README.md]}
In this case, the {[README.md]}
expression would be expanded into the content of this very file (woah). Incidentally, the backslashes above are used to enter a newline at the CLI.
Note that although they are useful, for security reasons, dynamic file variables are disabled for every other input method except the terminal program. Think putting {[/etc/passwd]}
deep in some python code and other such skullduggery.
Coming soon!
Ghostbox uses options extensively. See the full list of options and their documentation here. An option is something like "temperature" or "backend".
Options can be used in the following places:
ghostbox
program, e.g. --temperature 1.3
or --tts_voice af_sky
config.json
files in character folders, which contain one json dictionary that has options as key/value pairsbox.temperature = 1.0
from_*
factory functions, e.g. box = from_llamacpp(temperature=1.3, tts_voice="af_sky")
.options
context manager, e.g. with box.options(temperature=1.3, tts_voice="af_sky"):
Ghostbox
constructor**options
parameter in many method signatures/set
command in the CLI, e.g. /set temperature 1.3
or /set tts_voice "af_sky"
. List these with /lsoptions
Across these different uses for options, the naming is always consistent.
Coming soon!
Setting some options to True
has side effects, and may start services that run on seperate threads. Those options and their respective services are:
Option | Effect when set to True |
---|---|
tts | Starts a seperate tts process. Which program to start depends on the value of tts_program . The default is ghostbox-tts . If this is running, it will automatically speak generations, except when quiet is true. |
audio | Begins automatic audio transcription. In the CLI, you can pause the transcription with CTRL+C. This is highly configurable with the various audio_* options, including silence threshold and activation phrase. |
image_watch | Starts a service that watches a particular directory for new image files. When a new file appears, it gets send to the LLM for description. By default, this watches your platforms screenshot directory. |
http | Starts a simple HTTP web server, serving the web UI at http_host with port http_port . That's http://localhost:5050 by default. |
websock | Starts a websock server that sends out LLM generations and listens for input. After the initial HTTP handshake, this behaves almost exactly like stdout/stdin. This is used by the web UI. |
doc/
directory for extensive API documentation.examples/
folder has many in-depth examples, as does the tests/
. folder.ghostbox-helper
AI about it. I'm only half joking.Thanks to the excellent r/localllama, and to all those who have contributed there, either through code or expertise. Thanks to the people at llama.cpp for making local LLMs a reality, and thanks to KoboldCPP for also making them accessible. Speaking of kobolds, you should also check out their Kobold AI horde.
The number one way to contribut to ghostbox is to test it out and give me feedback. Either by opening github issues or just telling me what kind of features you actually use or want to see in the future. I wrote this in order to build cool stuff with LLMs, and then published in partly in hopes of inspiring others to do the same, so seeing people build stuff with it is always great.
There are no datasets linked
There are no datasets linked