Script Writing Test

Edit this benchmark and run it any time.

Past runs

Showing up to 25
RunStatusCreated
38f9e41dsucceeded1/12/2026, 6:16:48 PM
f1ecb3afrunning1/12/2026, 5:54:54 PM
Participant models
Selected: 8
Judge models
Selected: 2
Pairwise runs every model against every other model for each judge.
Judges will see this prompt plus participant outputs (blind to model identity).