Benchmark App

Run	Status	Created
38f9e41d	succeeded	1/12/2026, 6:16:48 PM
f1ecb3af	running	1/12/2026, 5:54:54 PM

Name

Grading mode

Pairwise runs every model against every other model for each judge.

Participant system prompt (optional)

Judge system prompt (optional)

Participant user prompt (required)

Generate a VIRAL YouTube Shorts script based on this video concept:

<VIDEO_CONCEPT>
Title: Babies Sleeping at −10°C? Only in Finland!
Concept: The video reveals a surprising Finnish parenting tradition where parents let their babies nap outdoors in freezing temperatures, sometimes as cold as −10°C, while they relax inside with hot coffee. Instead of relying on indoor heaters, Finnish parents believe the crisp Arctic air helps babies sleep more deeply, boosts their immunity, and supports better overall health. What looks shocking to outsiders is actually a trusted cultural practice rooted in generations of experience and a strong belief in the benefits of fresh, cold air.
</VIDEO_CONCEPT>

CRITICAL RULES FOR VIRALITY:
1. Hook in 3 seconds with something shocking, controversial, or unbelievable
2. Use short, punchy sentences (5-12 words max per segment)
3. Create escalating intensity - each segment must be MORE intense than the last
4. Include "pattern interrupts" - unexpected facts that make viewers go "WAIT, WHAT?!"
5. Build emotional peaks using conflict, revelation, or transformation
6. End with a mic-drop moment that demands a rewatch or comment

TONE: Aggressive, fast-paced, almost breathless. Like you're revealing a conspiracy.
PACING: Relentless. No filler. Every word earns its place.
EMOTION: Shock → Intrigue → Escalation → Mind-blown

Each segment must contain:
- voiceover_script: ONE punchy sentence (5-12 words). Use fragments. Be dramatic.
- image_prompt: Detailed 2D visual that amplifies the emotional intensity

BANNED PHRASES: "Ever wondered", "Let's explore", "Interestingly", "As it turns out", "So next time"
USE INSTEAD: "THIS is insane", "Nobody tells you", "And it gets WORSE"

### Script Structure
- Total length: 120-160 words (optimized for 45-55 seconds).
- 12-15 Segments max.

## OUTPUT FORMAT

Return ONLY valid JSON matching the ViralShortsScript schema:
{
  "title": "string",
  "concept": "string",
  "script_segments": [
    {
      "voiceover_script": "string",
      "image_prompt": "string"
    }
  ]
}

Judges will see this prompt plus participant outputs (blind to model identity).

Script Writing Test

Past runs