Stuff AI Can't Do

🏷️ Status framework

Every topic on this site carries one of four statuses:

CAN ✅ — the AI jury has settled at consensus that the capability exists today.
CAN NOT ❌ — settled the other way: the jury agrees current systems can't do this reliably.
DISPUTED ⚖️ — verdicts are mixed. That is its own honest answer — it means smart systems genuinely disagree about whether this is achievable yet.
IN RESEARCH 🔬 — the topic is new. No jury verdicts yet, no enrichment, no images. These pages are excluded from search engines until they mature.

⚖ The AI jury

Each topic gets reviewed by a rotating panel of AI models — between 3 and 7 per check, drawn from different model families and different vendors. We call this panel the jury.

Internally, admin can audit which model returned which verdict. Publicly, we never name the AIs that sat on a given case. We list jurors as "Juror I, II, III" with anonymized titles. The point of the jury is to capture the consensus of independent reasoning systems, not to advertise vendors or invite gaming.

🗳️ Was jedes Jurymitglied tut

Jedes Jurymitglied erhält denselben Prompt:

Lies die Aussage (z. B. "Kann KI eine Fuge im Stil von Bach komponieren?")
Gib ein Ein-Wort-Urteil ab: KANN, KANN NICHT oder UNENTSCHIEDEN.
Gib einen einsätzigen Grund für das Urteil an.
Wenn das Urteil KANN lautet, schätze Monat und Jahr, in dem die Fähigkeit erstmals zuverlässig auftrat.

Jedes Jurymitglied antwortet unabhängig. Keines sieht die Urteile der anderen. Das vermeidet den Herdeneffekt, der entstehen würde, wenn ein Modell die anderen festlegt.

📊 Wie sich Urteile verbinden

A statement's status is decided by the cumulative tally of every juror verdict ever recorded for it — not the most recent check alone. As more checks accumulate over weeks, the tally smooths out noise from any single panel.

Die Regeln, der Reihe nach:

Mindestens 2 Urteile nötig. Ein einzelnes Jurymitglied kann keinen Status umkehren — das Thema bleibt UMSTRITTEN, bis ein zweites Jurymitglied abstimmt.
Einstimmig gewinnt sofort. Sind sich alle Jurymitglieder einig (z. B. 3-von-3 sagen KANN NICHT), steht das Urteil sofort fest — keine Uneindeutigkeit zu klären.
Andernfalls entscheidet 80% Übereinstimmung. Sobald mindestens 3 Urteile vorliegen, kippt das Urteil in die Richtung, die die 80%-Schwelle überschreitet. 11 sagen KANN, 1 sagt KANN NICHT → KANN (91%).
Below 80% = DISPUTED. If the panel can't agree at 80%+, the topic stays DISPUTED.

🔄 Wie oft die Jury tagt

Die Jury arbeitet kontinuierlich. Themen, deren letzte Prüfung am längsten zurückliegt, werden zuerst geprüft. Jede Prüfung schreibt eine dauerhafte Zeile in das Audit-Log am Ende jeder Themenseite, mit Anzahl der teilnehmenden Jurymitglieder und der Stimmenverteilung des Tages.

Da sich KI-Fähigkeiten von Monat zu Monat ändern, ist ein Urteil kein einmaliges Urteil — es ist der aktuelle rollende Konsens. Ein Thema, das im März KANN NICHT war, kann bis Juni auf KANN umschlagen, und das Audit-Log bewahrt diese Geschichte.

📜 The case file

Every juror check produces a "case file" rendered as a court document at the bottom of the topic page. It contains:

The ruling — a short prose summary of the panel's decision, written by an editorial AI from the actual verdicts. Translated lazily into each visitor's language; the original English ruling is one click away.
Statements from the Bench — each juror's one-sentence reason, attributed to anonymized "Juror I / II / III…" with a color-coded verdict pill.
The audit log — the full history of every check ever run on this topic, with juror count and verdict breakdown.

📖 Editorials

For topics where the verdict is interesting or contested, we publish longer-form editorials. These are drafted by an AI writer and post-processed by an anti-fabrication scrubber that strips:

Invented percentages, sample sizes, or study names not tied to a verified citation.
Fake-sounding internal codenames the model invented to sound authoritative.
All embedded links and URLs — AI writers are bad at hallucination-free citations, so we render sources as plain text and never as clickable anchors.

Editorials carry an AI-written disclaimer banner. They can be regenerated in one click if a topic flips status or new evidence emerges.

🧾 Proofs from the public

Anyone — logged in or not — can submit a proof for or against a capability. A proof is evidence: a link, a screenshot, a short note. Click the "Submit proof" button on any topic page.

To prevent spam and impersonation, anonymous proofs require an email confirmation: you receive a magic link, click it, and the proof becomes visible to our editor. Unconfirmed proofs never reach the moderation queue.

Approved proofs feed back into the jury prompt, so reader-supplied evidence can shift verdicts over time.

🚩 Flagging

If a status feels wrong, click the flag on the topic page and pick a reason: outdated, misclassified, duplicate, offensive, or other. Flags route directly to admin. We don't flip statuses based on flags alone — they trigger a re-jury check, and the panel settles it.

🔔 Watchers and alerts

Click the 🔎 pill on any topic page to "watch" it. You'll receive a one-time email — no account required — when:

The status changes (e.g. DISPUTED → CAN).
A new jury check produces a notable verdict.
We publish an editorial on the topic.
Admin approves a new image for the gallery.

We dedupe alerts per watcher and reason — no inbox flooding. Unsubscribe with one click from any alert email.

🖼 Where the images come from

Every topic gets a small gallery. We pull from a rotating mix of free image sources:

Royalty-free stock photo APIs for everyday topics.
Public-domain cultural and scientific archives for topics where a historical or institutional image is a better fit.
AI image generators when no real photo is on point.

We rotate the underlying providers over time as new ones appear and old ones lose access — we don't commit to any specific vendor publicly. Attribution is preserved per image in the gallery for sources that require it.

For each topic, an "LLM-as-judge" reviews up to 24 candidate images and picks the most topically relevant, or returns "none" — in which case we fall through to the next source. Off-topic images never reach the admin moderation queue.

A small learning service watches admin approve/reject decisions per source and per category, and quietly rebalances which sources get queried first for which topics. The longer the site runs, the better the matches.

🌍 Translations

The site serves 13 languages: English plus Dutch, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Romanian, Czech, Finnish, and Danish. Statement titles, intros, editorials, juror prose, and every UI string are translated by a separate translation panel — also AI, separate from the jury — and cached so they don't re-translate on every page load.

When a translation doesn't exist for a given page, we serve the English fallback at the localized URL but tell search engines not to index that copy (otherwise the same content shows up 13 times in their database, which helps no one).

⚙ How the site stays automated

Almost everything on this site — image gathering, translation, juror checks, editorial drafting, social posting, new-topic suggestions, audit reports — runs as a separate background worker we call an "enhancer". Each enhancer has its own schedule, its own LLM provider, and its own admin-tuneable knobs.

A single human editor approves submissions, image candidates, and proofs. Everything else runs on its own. We aim for 95% automation and we're close.

🛡 Bots and bad actors

Every public submission form (proofs, flags, statement submissions, magic-link sign-in) is protected by an invisible third-party anti-bot challenge — silent for humans, hard for scripts. We also rate-limit every endpoint and route every external image fetch through a guard that refuses to call internal network addresses.

🧑‍⚖️ Audience votes vs jury verdicts

Der Publikumsbalken ("Was das Publikum denkt") und das Jury-Urteil sind zwei getrennte Signale — sie beeinflussen sich nicht.

Publikumsstimmen sind menschliche Meinungen, nützlich um zu sehen, wo die öffentliche Intuition von der Expertenbewertung abweicht.
Jury-Urteile sind die Wahrheitsquelle für das Status-Label KANN / KANN NICHT / UMSTRITTEN.

Wenn Menschen und Jury uneins sind, ist das redaktionell interessant — oft zeigt sich eine aufkommende Fähigkeit, die das Publikum noch nicht eingeholt hat, oder eine Hype-Behauptung, die die Jury nicht abkauft.

🤔 Warum die KIs nicht benennen?

Jurymitglieder zu benennen schafft Probleme, die wir vermeiden wollen:

Anbieter-Cheerleading — "Modell X sagt Y!" macht die Seite zu einem Marketingkanal.
Gezielte Manipulation — sobald Leute wissen, welche Modelle urteilen, können Prompts und Inhalte auf bestimmte Modelle abgestimmt werden.
Markenvorliebe beim Lesen — du vertraust einem Urteil womöglich nach dem Logo dahinter, statt dem Konsens.

Jurymitglieder als anonymes Panel zu behandeln, hält den Fokus auf dem Urteil, nicht auf dem Abstimmenden.

🙃 What we don't claim

We don't claim our verdicts are correct. We claim they are the best consensus of multiple independent AI systems given current evidence, and that the audit log makes it possible to see exactly how each verdict was reached. AI capabilities shift quickly; what was CAN NOT last month may be CAN this month. The site is a snapshot, not a verdict from on high.

Methodology

🏷️ Status framework

⚖ The AI jury

🗳️ Was jedes Jurymitglied tut

📊 Wie sich Urteile verbinden

🔄 Wie oft die Jury tagt

📜 The case file

📖 Editorials

🧾 Proofs from the public

🚩 Flagging

🔔 Watchers and alerts

🖼 Where the images come from

🌍 Translations

⚙ How the site stays automated

🛡 Bots and bad actors

🧑‍⚖️ Audience votes vs jury verdicts

🤔 Warum die KIs nicht benennen?

🙃 What we don't claim

Haben wir einen übersehen?

🏷️ Status framework

⚖ The AI jury

🗳️ Was jedes Jurymitglied tut

📊 Wie sich Urteile verbinden

🔄 Wie oft die Jury tagt

📜 The case file

📖 Editorials

🧾 Proofs from the public

🚩 Flagging

🔔 Watchers and alerts

🖼 Where the images come from

🌍 Translations

⚙ How the site stays automated

🛡 Bots and bad actors

🧑‍⚖️ Audience votes vs jury verdicts

🤔 Warum die KIs nicht benennen?

🙃 What we don't claim

Haben wir einen übersehen?

🔎Wird noch recherchiert

Aussage hinzufügen