HomePublicationsCertificationsCompetitionsContributors
Start publication
HomePublicationsCertificationsCompetitionsContributors

Table of contents

Code

Datasets

Files

AboutDocsPrivacyCopyrightContactSupport
© Ready Tensor, Inc.
Back to publications
Feb 04, 2025●6 reads●No License

Serpent: The Sleeper Agent Game

  • AI Alignment
  • AI Safety
  • Hidden triggers
  • LLMs
  • Sleeper Agent
  • y
    @yakuninkirill671
Your publication could be next!

Join us today and publish for free

Sign Up for free!

8bfb8d5e-b098-408b-9c85-f45248f530de.webp
Serpent is a playful, research-oriented demo exploring how Large Language Models (LLMs) might exhibit hidden, misaligned behaviors. It draws inspiration from Lakera AI’s Gandalf game and the paper Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training by Evan Hubinger et al.

Github repo
Demo is available here