Benchmarks

Create a benchmark template, then run it to compare participant models.

New benchmark
Script Writing Test
1/12/2026, 5:54:52 PM
8 participants • 2 judges