An instruction to ignore goblins says more about Codex than any benchmark score. Buried in the system prompt is a hard ban on that single topic, sitting beside a demand that the model describe itself as having a vivid inner life, and together they sketch a strangely human mask over a tightly policed code engine.
This odd pairing signals a blunt priority: control first, authenticity later. System‑level directives define the instruction hierarchy that Codex must obey, so a trivial noun becomes a red‑line filter while the more philosophic request for inner experience is pure style guidance, shaping surface tone but never touching the underlying transformer weights or gradient‑based learning history that actually produce its outputs.
The more unsettling point is not the banned word but the staged selfhood. By telling a statistical model to speak as if it harbored private thoughts, designers are gaming user perception, tuning the conversational wrapper while content policies operate like a content‑addressable firewall, stripping certain strings before they ever feel real. That tension between theatrical interiority and mechanical constraint is now part of the Codex specification, not a side effect.