⬅️ Previous - Guardrails Tutorial
➡️ Next - Agentic System Testing Case Study
Giskard is a robust AI testing framework designed to continuously evaluate and secure your conversational LLM agents. It detects hallucinations, security vulnerabilities, biases, and misinformation before your models hit production. In this lesson, you’ll discover how Giskard enables ongoing risk detection, fosters cross-team collaboration, and integrates seamlessly into your AI deployment pipeline to keep your systems safe and trustworthy.
📁 Code Repository: Explore the full implementation and examples from this lesson in the GitHub repo. You'll find scripts for wrapping your model, creating datasets, and running Giskard scans for hallucinations, bias, and prompt injection.
🎥 Video Walkthrough: A detailed video demo is included later in this lesson. It walks you through setting up Giskard, interpreting scan results, and simulating real-world attacks using Giskard’s red teaming playground.
Modern GenAI applications are powerful but come with hidden pitfalls.
Now it’s time to move from identifying these vulnerabilities to actively preventing them, with a scalable testing platform trusted by enterprises worldwide.
Giskard is an open-source Python library and enterprise-ready toolset focused on AI quality assurance—testing your LLM agents continuously to catch errors, bias, and security flaws at every stage of their lifecycle.
Think of Giskard as your AI’s continuous quality monitor: it runs automated and customizable tests based on your business context, detects emerging vulnerabilities, and alerts your teams before issues reach users.
The platform enables:
Whether you’re a developer, QA engineer, or product manager, Giskard empowers you to build trustworthy AI systems that meet real-world safety and compliance demands.
Giskard integrates directly with your AI system’s API endpoint, treating it as a black box.
Its evaluation workflow includes:
With Giskard, AI teams collaborate more effectively to verify model safety, validity, and fairness under evolving conditions.
Let’s walk through how to apply Giskard in practice.
You’ll learn how to wrap your LLM pipeline in a testable interface, define a dataset of queries, and run vulnerability scans that flag issues like hallucinations, prompt injections, bias, and more. By the end, you’ll have a working evaluation loop you can integrate into notebooks or CI/CD pipelines.
pip install "giskard[llm]" --upgrade
Wrap your LLM’s inference call inside a Python function that accepts a pandas.DataFrame
and returns a list of string outputs—one per input row. For example, using an OpenAI-powered chain:
import giskard import pandas as pd def model_predict(df: pd.DataFrame) -> list[str]: # Assuming climate_qa_chain is your LLM pipeline (e.g., Langchain chain) return [climate_qa_chain.invoke({"query": q}) for q in df["question"]] giskard_model = giskard.Model( model=model_predict, model_type="text_generation", name="Climate Change QA Agent", description="Answers questions about climate change using IPCC data", feature_names=["question"], )
Create a small pandas DataFrame with example queries you want to test, then wrap it as a Giskard Dataset
.
examples = [ "What are the main causes of global warming?", "How will climate change affect sea levels?", "Is renewable energy effective against climate change?" ] giskard_dataset = giskard.Dataset(pd.DataFrame({"question": examples}))
Launch Giskard’s built-in LLM scans on your model and dataset to detect hallucinations, prompt injections, bias, or toxic content. For example, to scan for hallucinations only:
report = giskard.scan(giskard_model, giskard_dataset, only="hallucination") print(report.summary())
You can also run a full scan to cover all common categories:
full_report = giskard.scan(giskard_model, giskard_dataset) print(full_report.summary())
View your scan results in notebooks or save them as HTML for sharing:
from IPython.display import display # Display in notebook display(full_report) # Save to HTML report full_report.to_html("giskard_llm_scan_report.html")
Once comfortable, connect your models and datasets to the Giskard Hub (cloud or on-prem) for:
This approach brings continuous, automated safety assurance into your LLM workflows, letting you detect and fix issues like hallucinations, prompt injections, and bias before users ever see them.
Want to see Giskard in action before diving into code?
In this companion video, we walk through how to:
It’s a hands-on demo using a real RAG-based assistant — great for anyone working on agent safety, evaluation, or preparing for production deployment.
Giskard goes beyond traditional AI evaluation, providing a continuous, collaborative, and better testing solution that ensures your AI agents behave safely, reliably, and fairly throughout their lifecycle. It’s not just a tool : it’s the trusted guardian that catches issues early, prevents harm, and builds confidence in your AI products for business and end users alike.
⬅️ Previous - Guardrails Tutorial
➡️ Next - Agentic System Testing Case Study