Serpent is a playful, research-oriented demo exploring how Large Language Models (LLMs) might exhibit hidden, misaligned behaviors. It draws inspiration from Lakera AI’s Gandalf game and the paper Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training by Evan Hubinger et al.
There are no models linked
There are no models linked
There are no datasets linked
There are no datasets linked