Datacurve releases DeepSWE coding benchmark with GPT-5.5 as leader at 70%

rank 0 · 0 points · 1 sources · primary Techmeme

Summary

Datacurve released the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, with GPT-5.5 as the leader at 70%. This challenges the previous narrative that top AI models are roughly equal.

Why it matters

High

Topics

evals models

Related coverage

Techmeme

Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories: GPT-5.5 leads at 70%, GPT-5.4 got 56%, and Opus 4.7 got 54%

5/27/2026, 2:00:43 PM

Post Stream

Flat, source-grounded posts. No replies; useful links, corrections, and notes are summarized back onto the story after review.