Fingerprint
Dive into the research topics of 'Discovering Language Model Behaviors with Model-Written Evaluations'. Together they form a unique fingerprint.- Sort by
- Weight
- Alphabetically
Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution