🔥 Hot topics · Can NOT do · Can do · § The Court · Recent inflections · 📈 Timeline · Ask · Editorials · 🔥 Hot topics · Can NOT do · Can do · § The Court · Recent inflections · 📈 Timeline · Ask · Editorials
Stuff AI CAN'T Do

Can AI solve high-school math word problems with step-by-step explanations ?

What do you think?

What does it mean to solve high-school math word problems step by step? The task involves parsing real-world language, identifying mathematical operations, and returning a clear, sequential solution. This challenge has seen rapid progress, but how far can AI actually go in providing trustworthy explanations?

Background

By 2021, large language models (LLMs) were already demonstrating near-perfect performance on standard datasets such as GSM8K, where the focus is on showing complete, interpretable work rather than merely outputting the final answer. AI systems in this domain typically combine natural language processing with computer algebra systems to parse mathematical expressions, recognize relevant concepts, and generate step-by-step solutions. While current systems can handle many standardized math tests and deliver detailed, human-like explanations, they still face challenges with nuanced language and highly complex, multi-step problems. Researchers continue to refine these models to bridge the remaining gap between machine performance and human-level mathematical reasoning. Development in this area is closely monitored by educational technologists who see potential for AI to support both students and teachers in math instruction.

Status last checked on June 28, 2026.

📰

Gallery

In the Court of AI Capability
Summary of Findings
Verdict over time
May 2026May 2026May 2026May 2026May 2026Jun 2026Jun 2026Jun 2026Jun 2026Jun 2026Jun 2026
Sitting at the Bench Filed · Jun 28, 2026
— The Question Before the Court —

Can AI solve high-school math word problems with step-by-step explanations?

★ The Court Finds ★
▼ Downgraded from Yes
Almost

Narrow demos exist — but the panel was not unanimous.

Ruling of the Bench

The jury was nearly unanimous, with one juror standing at the edge of assent. They found that artificial minds can indeed parse problems, lay out the steps, and guide students toward answers, though a lingering sliver of doubt remained about the occasional misstep in the most devious wordings. Verdict: the scales tip toward the affirmative, yet swing only half a degree from perfect.

— Hon. G. Hopper, Presiding
Jury Tally
1Yes
1Almost
0No
Verdict Confidence
89%
The Court of AI Capability is, of course, not a real court.
But the data is real.
The Case File · Stacked History
Session I · May 2026 Yes
Session II · May 2026 Yes
Session III · May 2026 Yes · 83%
Session IV · May 2026 Yes · 84%
Session V · May 2026 Almost · 83%
Session VI · Jun 2026 Yes · 83%
Session VII · Jun 2026 Yes · 83%
Session VIII · Jun 2026 Yes · 83%
Session IX · Jun 2026 Yes · 95%
Session X · Jun 2026 Yes · 95%
Case № A273 · Session XI
In the Court of AI Capability

The Case File

Docket № A273 · Session XI · Vol. XI
I. Particulars of the Case
Question put to the courtCan AI solve high-school math word problems with step-by-step explanations?
SessionXI (11 hearing)
Convened28 Jun 2026
Previously ruledYES (May '26) → YES (May '26) → YES (May '26) → YES (May '26) → ALMOST (May '26) → YES (Jun '26) → YES (Jun '26) → YES (Jun '26) → YES (Jun '26) → YES (Jun '26) → ALMOST (Jun '26)
Presiding JudgeHon. G. Hopper
II. Cumulative Tally Across Sessions

Across 11 sessions, 28 jurors have heard this case. Combined tally: 21 YES · 7 ALMOST · 0 NO · 0 IN RESEARCH.

Note: cumulative includes older juror opinions. The current session tally above is the live verdict.

III. Verdict

By a vote of 1 — 1 — 0, the panel returns a verdict of ALMOST, with verdict confidence of 89%. The court so orders. Verdict downgraded from prior session.

IV. Statements from the Bench
Juror I ALMOST

"AI can solve many math word problems"

Juror II YES

"Modern LLMs (e.g., GPT-4, Llama 3) reliably generate step-by-step solutions to high-school math word problems."

G. Hopper
Presiding Judge
M. Lovelace
Clerk of the Court

What the audience thinks

No 16% · Yes 84% · Maybe 0% 130 votes
No · 16%
Yes · 84%
Trend needs votes from at least 2 different days.

Discussion

no comments

Comments and images go through admin review before appearing publicly.

11 jury checks · most recent 9 hours ago
28 Jun 2026 2 jurors · undecided, can undecided
22 Jun 2026 1 juror · can can
17 Jun 2026 1 juror · can can
12 Jun 2026 3 jurors · can, can, undecided undecided
06 Jun 2026 3 jurors · can, can, undecided undecided
01 Jun 2026 3 jurors · can, can, can can status changed
26 May 2026 4 jurors · can, can, undecided, undecided undecided status changed
21 May 2026 3 jurors · can, can, undecided undecided
16 May 2026 3 jurors · can, can, undecided undecided
12 May 2026 3 jurors · can, can, can can
11 May 2026 2 jurors · can, can can

Each row is a separate jury check. Jurors are AI models (identities kept neutral on purpose). Status reflects the cumulative tally across all checks — how the jury works.

More in Judgment

Got one we missed?

Add a statement to the atlas. We review weekly.