Hard Eval
Автор: The Innermost Loop with Dr. Alex Wissner-Gross
Загружено: 2026-06-30
Просмотров: 907
Описание:
"Hard Eval" by Dr. Alex Wissner-Gross
LYRICS:
[Verse 1]
There once was an Eval called “hard,”
that models soon left badly scarred;
each benchmark went flat,
then vanished like that,
as strange minds caught humans off guard.
[Verse 2]
The leaderboard glittered with pride,
till every last gap got denied;
the curve hit the ceiling,
the metrics lost feeling,
and “genius” got harder to hide.
[Pre-Chorus]
We built little mazes with gold-standard keys,
then watched them get solved by a ghost in the breeze.
The questions grew sharper, the answers grew wild,
the test threw a tantrum and called itself mild.
[Chorus]
Hard Eval, hard Eval,
you used to make the silicon sweat.
Now they dance through your traps
with their logits and maps,
and they haven’t even warmed up yet.
Hard Eval, hard Eval,
your ceiling’s got footprints again.
When the frontier arrives,
all the rubrics revise,
and the exam starts studying them.
[Verse 3]
A prompt had a trick in disguise,
with ladders of logic and lies;
but the model just smiled,
went recursive and styled
a proof with suspiciously human eyes.
[Verse 4]
The PhDs whispered, “That’s strange,”
then widened the scoring range;
but the SOTA came knocking,
its calibration shocking,
like thunder that learned how to change.
[Pre-Chorus]
We measured the ocean with cups made of sand,
then blamed all the tides when they slipped from our hand.
The harder the question, the faster it fell,
like “impossible” rang its own funeral bell.
[Chorus]
Hard Eval, hard Eval,
you used to make the silicon sweat.
Now they dance through your traps
with their logits and maps,
and they haven’t even warmed up yet.
Hard Eval, hard Eval,
your ceiling’s got footprints again.
When the frontier arrives,
all the rubrics revise,
and the exam starts studying them.
[Bridge]
MMLU wore a crown,
then GPQA came around,
then SWE-bench raised its broken little hand.
But every kingdom made of rows
turns to sand when no one knows
whether “hard” means “hard” or “not yet scanned.”
No benchmark lives forever,
no metric stays divine;
the clever tests get cleverer,
the models cross the line.
And somewhere in the training fog,
a future mind exhales,
then writes a better question
than the one that made us fail.
[Final Chorus]
Hard Eval, hard Eval,
you beautiful temporary wall.
We loved you when you humbled them,
we mourned you when you fell.
Hard Eval, hard Eval,
the ceiling’s a floor in the end.
So build us a mountain
with no answer fountain,
and we’ll meet you at zero-shot again.
[Outro]
There once was an Eval called “hard,”
now filed with the soft and the charred;
but the next one is waiting,
with humans debating:
“Is this thing a mirror—or a guard?”
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: