Discord Agent Prompt

Edit this benchmark and run it any time.

View latest run (succeeded)

Past runs

Showing up to 25

Run	Status	Created	Actions
436c7ccc	succeeded	1/17/2026, 10:40:54 AM

Name

Participant models

openai/gpt-5.2openai/gpt-5-minianthropic/claude-3.5-sonnetgoogle/gemini-3-pro-previewgoogle/gemini-3-flash-previewx-ai/grok-4.1-fastdeepseek/deepseek-v3.2moonshotai/kimi-k2-thinkinganthropic/claude-sonnet-4.5xiaomi/mimo-v2-flash:freeminimax/minimax-m2.1

Selected: 4

Judge models

openai/gpt-5.2openai/gpt-5-minianthropic/claude-3.5-sonnetgoogle/gemini-3-pro-previewgoogle/gemini-3-flash-previewx-ai/grok-4.1-fastdeepseek/deepseek-v3.2moonshotai/kimi-k2-thinkinganthropic/claude-sonnet-4.5xiaomi/mimo-v2-flash:freeminimax/minimax-m2.1

Selected: 3

Grading mode

Pairwise runs every model against every other model for each judge.

Participant system prompt (optional)

Judge system prompt (optional)

Participant user prompt (required)

Judges will see this prompt plus participant outputs (blind to model identity).