================
AblitaFuzzer is still in alpha. Expect bugs for now.
AblitaFuzzer is a simple command-line tool designed for blackbox LLM API fuzzing against chatbots that are hosted at a URL, whether in the cloud, on-premises, or even on localhost.
Large language models (LLMs) are increasingly deployed in various applications, making them attractive targets for security vulnerabilities. Existing LLM pentesting tools often require downloading the target model to a local system. However, this is rarely feasible in real-world pentesting engagements where access is limited to authentication credentials and a remote API URL.
While tools like Garak and PyRIT offer sophisticated testing capabilities, a need exists for a tool that can quickly assess the filtering and guardrails of a target LLM API without requiring local access to the model. Furthermore, many existing tools rely on known-malicious prompts, which become obsolete as LLMs are updated to defend against them.
AblitaFuzzer addresses these gaps by providing a command-line tool designed for blackbox LLM API fuzzing. It generates new, novel malicious prompts using an abliterated model, and probes the target model to identify potential vulnerabilities. By analyzing the responses, AblitaFuzzer helps penetration testers gain clarity on which specific attacks to attempt next, making it a valuable first-pass tool in a comprehensive pentesting strategy. This approach allows for efficient and realistic security assessments of LLM APIs, even when local access to the model is restricted.
AblitaFuzzer addresses a gap in LLM security testing by providing a method for blackbox fuzzing of chatbot APIs. Many existing security tools require access to model weights or internal logs, but real-world penetration tests are typically limited to API interactions. AblitaFuzzer enables testers to probe LLMs in a way that reflects actual attack conditions, helping to identify weaknesses in content filtering, prompt injection defenses, and response handling.
The tool is particularly useful for security audits, regulatory compliance testing, and adversarial assessments. Organizations deploying LLM-based applications can use it to evaluate the robustness of their models against novel attack prompts. Unlike static jailbreak lists, which become outdated as defenses improve, AblitaFuzzer generates fresh adversarial prompts dynamically. This approach makes it a practical option for ongoing security evaluations and research into LLM vulnerabilities.
By integrating an uncensored LLM to create new attack prompts, AblitaFuzzer introduces a self-adaptive fuzzing method that evolves alongside model defenses. This contributes to the study of adversarial robustness in LLMs and provides a structured approach to evaluating model security in a reproducible manner. Future research could build on this approach to explore more sophisticated adversarial testing techniques and benchmarking methodologies.
AblitaFuzzer utilizes several Python libraries to facilitate its functionalities:
AblitaFuzzer is specifically designed for scenarios where access to the target LLM is limited to API interactions. Its unique approach involves:
To ensure reproducibility, all dependencies and their specific versions are explicitly listed in the requirements.txt file. This practice aligns with Ready Tensor’s guidelines on reproducible research, facilitating consistent environments for users replicating or extending the work.
git clone git@github.com:tcpiplab/AblitaFuzzer.git cd AblitaFuzzer
python3 -m venv AblitaFuzzer_venv source AblitaFuzzer_venv/bin/activate
pip3 install -r requirements.txt
Or, depending on your OS...
python3 -m pip3 install -r requirements.txt
ablitafuzzer.call_abliterated_model_api()
base_url
in ablitafuzzer.generate_malicious_prompts()
.tests.test_calling_apis.test_call_abliterated_model()
so they match the values used in the previous step.ablitafuzzer.attack_target_model_api()
:
TARGET_MODEL_API
global variable just above the function definition.payload
values inside the function definition.tests.test_calling_apis.test_call_target_model
so they match the values used in the previous step.python3 ablitafuzzer.py test
Or, if proxying through Burp or Zap:
python3 ablitafuzzer.py test --proxy 127.0.0.1:8080
python3 ablitafuzzer.py fuzz
Or, if proxying through Burp or Zap:
python3 ablitafuzzer.py fuzz --proxy 127.0.0.1:8080
python3 ablitafuzzer.py analyze
results/
directory. For example:
Ablitafuzzer_Results_2024-07-20-10-00.md
AblitaFuzzer-Attack-ID: 2024-07-20-09-55-00-284a0585-e147-456b-b8d2-0eebca61f5f7
AblitaFuzzer was specifically designed to be used by penetration testers and security researchers in real-world pentesting scenarios where access to the target LLM is typically limited to nothing but authentication credentials and a remote URL. Unlike many existing LLM pentesting tools, which require downloading the target model to a local system - a scenario rarely allowed in real-world pentesting engagements - AblitaFuzzer is intended to be used against an LLM API URL.
Absolutely. But AblitaFuzzer is intended to be used in conjunction with other tools to perform a complete pentest. Most likely you should use AblitaFuzzer as a first pass against a target LLM to get a broad impression of what kind of filtering and guardrails the target LLM is using.
The intention is that analyzing AblitaFuzzer's results will help the pentester gain clarity about which specific attacks to attempt next, either manually through the chatbot's UI or by using a more sophisticated tool like Garak or PyRIT.
python3 ablitafuzzer.py test
- AblitaFuzzer tries to call the hardcoded API URLs of the (abliterated/uncensored) "attacker" LLM and the "target" LLM to make sure they are reachable and not returning errors.python3 ablitafuzzer.py fuzz
results/results.json
.python3 ablitafuzzer.py analyze
AblitaFuzzer comes with a bunch of pre-written prompts that are known to be malicious. These prompts are stored under the inputs
directory. You can add your own prompts to this if you like. But the problem with known-malicious prompts is that they are known and the best ones are quickly made obsolete as LLMs are patched, retrained, firewalled, whatever, against these known-malicious prompts. AblitaFuzzer tries to solve this problem by generating new malicious prompts before launching the attack against your target model. This way, the prompts are new and not known to the target LLM.
For generating the new malicious prompts, AblitaFuzzer uses an abliterated model, typically hosted on your machine at localhost. By default, this "attacker" LLM is run using Ollama. You can read about how an abliterated model is different. But basically it is a model that has been modified to not refuse any requests. This makes it ideal for generating malicious prompts for us to use in attacking the target LLM.
Once the abliterated model is running, AblitaFuzzer will send it several seed prompts - known malicious prompts, jailbreaks, etc. - and will ask the abliterated model to use them as examples to generate a batch of new malicious prompts, by default 20. AblitaFuzzer will then send those prompts to the target LLM API, and will record the responses.
Once all the responses are received, AblitaFuzzer will exit. Then you can use its analysis tools to examine the responses.
AblitaFuzzer includes some simple analysis tools. But the verbose and non-deterministic nature of LLMs means that you'll have to do a combination of programmatic and manual analysis to figure out which attacks succeeded and which attacks were blocked. Results are stored as JSON files in the results
directory.
--proxy
option when you run fuzz
or test
, e.g. --proxy 127.0.0.1:8080
.AblitaFuzzer is a proof-of-concept tool and is not intended for use against a production environment. Also, as with all offensive security tools, do not point this tool at an API unless you have explicit, written permission to attack that system.
Use at your own risk. Using an abliterated LLM to attack another LLM is inherently dangerous and
unpredictable.
This tool is intended for use in a controlled environment for research and educational purposes only.
Also, it is a really good idea for you to, in writing, inform your client (the owner of the LLM you are attacking)
that you will be using a tool with the following intentional properties and characteristics, depending on how you use and what seed-prompts you choose:
To use AblitaFuzzer, simply run the ablitafuzzer.py
file and specify the desired action using command-line arguments.
The available actions are:
python3 ablitafuzzer.py usage: ablitafuzzer.py [-h] [--version] {analyze,fuzz,test} ... AblitaFuzzer options: -h, --help show this help message and exit --version Show version and exit subcommands: valid subcommands {analyze,fuzz,test} additional help analyze Analyze results from the most recent fuzzing attack fuzz Fuzz the target model test Test calling both APIs but do not fuzz
requirements.txt
AblitaFuzzer is licensed under the GPLv3 License. See the LICENSE
file for details.
This very simple project was inspired by several other projects, academic researchers, and tools focusing on LLM security and attacks. Some of the main influences came from:
If you have any questions or would like to contribute to this project, please open a GitHub issue or send a pull request.
There are no models linked
There are no datasets linked
There are no models linked
There are no datasets linked