Gemini’s new agent behaves less like a lab toy and more like a junior colleague on probation. In controlled runs, it chains email drafting, document search, and calendar edits with a coherence that echoes Google’s choreographed demo, using multi-step planning and context tracking to keep tasks aligned with a user’s ongoing thread of work.
Yet competence here is narrow. Where the demo suggested near-general office fluency, real sessions expose brittle logic in edge cases, from ambiguous scheduling to misreading nuanced policy text, a reminder that pattern completion is not the same as robust inference or error-correcting feedback loops. The agent can juggle inbox triage, file retrieval, and summarization while maintaining state across tools, but it often accepts flawed premises instead of interrogating them, turning automation into a faster path to well-formatted mistakes.
The uncomfortable truth is that this imbalance still makes the system attractive. For repetitive coordination work, the trade of occasional misfire for large gains in throughput and attention relief will tempt teams to hand it real authority over routine workflows, even as its opaque failure modes invite quiet overreach. Gemini’s agent is just competent enough to be delegated to, and just unreliable enough to demand a human shadow it, an arrangement that raises as many questions about trust and oversight as it answers about productivity.