What benchmark did OpenAI GPT-5.4 beat in 2026?

GPT-5.4 scored 75% on OSWorld-V, surpassing the human baseline of 72.4% — the first time an AI system has outperformed the average knowledge worker on this standardised multi-step task benchmark.

What is the context window size of GPT-5.4?

GPT-5.4 has a one-million-token context window, enough to hold approximately 750,000 words or ten full-length novels in a single inference call.

Does ChatGPT GPT-5.4 now show ads?

Yes. OpenAI launched a self-serve Ads Manager platform alongside GPT-5.4, allowing advertisers to run clearly labelled sponsored content inside ChatGPT, marking OpenAI entry into advertising revenue.

Can GPT-5.4 perform tasks autonomously without human input?

Yes. GPT-5.4 can autonomously execute multi-step workflows — booking calendars, pulling API data, creating presentations, sending emails, and filing project tickets — all from a single natural language instruction.

What is OSWorld-V and why is it important for AI?

OSWorld-V is a benchmark that evaluates AI systems on autonomous task completion across real software environments. Scoring above the 72.4% human baseline marks a significant milestone for autonomous AI capability.

OpenAI GPT-5.4: Beats Human Benchmark, 1M Context Window, and Ads Platform

The Benchmark That Changes Everything

For years, the question in AI research circles has been: when will AI match human performance on open-ended, multi-step knowledge work that constitutes the bulk of white-collar employment? That question got a definitive answer this week when OpenAI unveiled GPT-5.4 and disclosed its performance on OSWorld-V — a benchmark designed to evaluate autonomous task completion across real software environments.

GPT-5.4 scored 75% on OSWorld-V. The human baseline is 72.4%. For the first time, an AI system has demonstrably outperformed the average knowledge worker on the benchmark's standardized task suite, which includes navigating operating systems, writing and debugging code, analyzing spreadsheets, composing multi-part documents, and coordinating across multiple software applications simultaneously.

The 1-Million-Token Context Window

GPT-5.4 launches with a one-million-token context window — enough to hold approximately 750,000 words, or roughly ten full-length novels, in a single inference call. For enterprise users, this means GPT-5.4 can ingest an entire codebase, a complete multi-year email archive, a full legal discovery document set, or a comprehensive financial dataset and reason across all of it without losing context.

Multi-Step Workflow Execution

GPT-5.4's most commercially significant capability is its ability to autonomously execute multi-step workflows across software environments. In demonstrations, the model was shown booking calendar appointments, pulling data from a live API, generating a formatted slide presentation, emailing it to specified recipients, and filing the relevant ticket in a project management system — all from a single natural language instruction.

The Ads Manager: ChatGPT Enters the Advertising Economy

OpenAI unveiled a self-serve Ads Manager platform that allows advertisers to create, manage, and optimize campaigns that run inside ChatGPT. The launch represents OpenAI's clearest signal yet that advertising will be part of its long-term revenue model. ChatGPT has over 600 million monthly active users — a uniquely valuable audience.

The Workforce Question Nobody Wants to Ask Out Loud

When a system scores above human average on a knowledge work benchmark and can execute autonomous multi-step workflows, the conversation about AI's impact on employment shifts from theoretical to urgent. OpenAI has been careful to frame GPT-5.4 as a tool that augments human workers, and studies of GPT-4 deployment showed productivity gains rather than headcount reductions in the short term.

OpenAI GPT-5.4 Just Scored Higher Than Humans on Knowledge Work — What Happens Next?

The Benchmark That Changes Everything

The 1-Million-Token Context Window

Multi-Step Workflow Execution

The Ads Manager: ChatGPT Enters the Advertising Economy

The Workforce Question Nobody Wants to Ask Out Loud

Frequently Asked Questions

More Stories

Sarvam AI Is India's Newest Unicorn — Here's What Changed

Jio Platforms IPO: India's Biggest Market Debut in 2026 Explained

India's Chip Revolution: Semiconductor Mission 2.0 Explained