Datacurve releases DeepSWE coding benchmark with GPT-5.5 as leader at 70%
rank 0 · 0 points · 1 sources · primary Techmeme
Summary
Datacurve released the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, with GPT-5.5 as the leader at 70%. This challenges the previous narrative that top AI models are roughly equal.
Why it matters
High
Related coverage
| Techmeme | Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories: GPT-5.5 leads at 70%, GPT-5.4 got 56%, and Opus 4.7 got 54% | 5/27/2026, 2:00:43 PM |
Post Stream
Flat, source-grounded posts. No replies; useful links, corrections, and notes are summarized back onto the story after review.
No posts have been added to this cluster yet.