Pelicans for Opus 4.6 and Codex 5.3 - I don't have much interesting to say about these models yet to be honest, they're both incremental improvements on their predecessors and very capable https://simonwillison.net/2026/Feb/5/two-new-models/
Pelicans for Opus 4.6 and Codex 5.3 - I don't have much interesting to say about these models yet to be honest, they're both incremental improvements on their predecessors and very capable https://simonwillison.net/2026/Feb/5/two-new-models/
@simon "I've had a bit of preview access to both of these models and to be honest I'm finding it hard to find a good angle to write about them"
How about rating their own and each other's completed code with different instances?
From my experience, Claude was still much worse considering overall planning. Also web search on Claude seemed to be much worse than GPT 5.2.
@simon haven't tried opus 4.6 yet but 4.5 couldn't generate emails with the beefree simple schema json with a design that actually looked that great
https://docs.beefree.io/beefree-sdk/data-structures/simple-schema
@simon "I've been having trouble finding tasks that those previous models couldn't handle but the new ones are able to ace." Ask it to write assembly. Gemini 3 and Opus 4.5 were the first I could get to write non-trivial assembly programs, though they both failed to write "life" with sixel graphics.
@simon it could be that the guys at #Anthropic know you and your "pelican on a bicycle" test, since you are a well known AI blogger