Today's prices
Same tasks for every model. Each price is its real invoice divided by the work it finished.
bar length & colour = price
quality = work finished
▲▼rank vs price per token
Quality is the share of work a model finished. Prices are real OpenRouter invoices, every token and retry counted.
How we appraise
Four kinds of work, graded by machine. Nothing is judged by vibes.
From task to price
Taskworth a fixed number of work units
Modelsolves it; we keep the real invoice
Graderscores it from 0 to 1, mechanically
Pricedollars spent per work unit earned
Sample tasks one from each kind of work
Price your own agent
Those prices are for our tasks. Describe yours and we'll build a bespoke benchmark, run it across the frontier, and email you a shareable price list.
Starting the appraisal…
0%
Run log
Out of runs for today
Leave your email. We'll hold a slot and invite you tomorrow.