Discord Agent Prompt

Edit this benchmark and run it any time.

Past runs

Showing up to 25
RunStatusCreatedActions
436c7cccsucceeded1/17/2026, 10:40:54 AM
Participant models
Selected: 4
Judge models
Selected: 3
Pairwise runs every model against every other model for each judge.
Judges will see this prompt plus participant outputs (blind to model identity).