Today I experimented with OpenAI’s new reasoning models, o3 and o4‑mini. They can “think” with images, which is both slick and genuinely impressive. But are they dramatically smarter than the earlier o‑family models? Some researchers claim these versions spark more novel ideas; I noticed flashes of that in older models, yet the effect does feel stronger now.

Is artificial general intelligence (AGI) finally coming into view? Let’s define AGI, loosely, as an AI capable of taking over almost any intellectual job humans perform today. While generative AI often looks brilliant, it can still act remarkably clueless.

  • o3 can design a slide deck—then blithely cram titles so long they spill over the slide’s width.

  • The models still miss serious security flaws in code they write.

So yes, o3 and o4‑mini are real advances—but do they advance toward AGI? Maybe… or maybe incremental scaling of the current generative‑AI paradigm isn’t the path. Reaching AGI could demand a fundamentally different approach, not just bigger nets and longer context windows.

A system on the road to AGI shouldn’t just obey poor instructions—it should notice they’re poor. The fact that o3 happily spews slide titles that don’t fit hints at a missing layer of meta‑cognition. Real AGI will need built‑in guardrails that detect when the task spec is underspecified or self‑defeating and then either fix it autonomously or flag the problem for its human partner.

Until the bots can nag us about our sloppy prompts the way a copy‑editor can about our texts, AGI remains over the horizon.