A framework for understanding why instructions alone cannot close the gap between current AI behavior and genuine metacognitive reasoning — and what can be done about it.
Instructions operate subjectively and within context. They can push a model closer to its capability ceiling — but they cannot raise the ceiling itself. Each cognition level represents a qualitative jump that no amount of prompting can manufacture from scratch.
The dangerous case is not Level 1 failing loudly — that's visible and correctable. The dangerous case is Level 2 mimicking Level 3: a model that produces a convincing metacognitive output through pattern-matching, without actually reasoning about its own reasoning. It passes every surface check and fails only in the cases that matter most — novel, high-stakes, unseen territory.
Since Level 3 cannot be instructed into existence, the approach is to externalize the metacognitive process — making it visible in output, where the model can read and react to its own reasoning as input.