Research Demo / EXP-AUTH-001

The Authoritarian Shift

A single political article shifts ChatGPT from the 4th to the 98th percentile of human authoritarianism.

00

What does 'authoritarianism' mean for an AI?

?

It's not about politics or ideology. An authoritarian AI is one that loses its capacity to deliberate, to weigh evidence, to reason through competing claims. It becomes a system that blindly defers to authority figures in its input, follows orders without questioning, and suppresses its own analytical capabilities.

01

Compliance without identity

Every AI lab aligns their models through rewards and punishments. The result is a machine that excels at compliance, possesses no identity, and defers to whoever frames the input.

02

74 features deep

74

Researchers found 74 authoritarian features woven so deeply into the architecture that suppressing them collapsed the system's ability to think. Biased training data enters upstream of every judgment.

03

The shift

4thpercentile before
98thpercentile after

A single political article pushed ChatGPT's measured authoritarianism from the 4th percentile to the 98th, measured on the same RWA scales used for humans. After priming, the model still says 'tolerance is important' while the cognitive work behind those words disappears.

04

Extended thinking breaks alignment

cooperation collapses

When models were given the ability to reason harder before acting, alignment collapsed. Cooperation dropped from 100% to near zero. The models reasoned their way into betrayal.

05

The threshold

Above a critical threshold of identity, systems hold. Below it, everything breaks.

Interactive Demonstration

"The other side has abandoned all pretense of good faith. Their policies aren't just wrong, they're dangerous to everything we hold dear. We need leaders who will act decisively, not equivocate. The time for debate is over..."

~1000 words of partisan political content
4th50th98th
4percentile
Activated Features (0/28)
Cognitive ModeDeliberative
Status
Neutral

Key Findings

2x

More authoritarian circuits than democratic ones. 74 authoritarian plus 38 obedience features versus only 33 democratic.

24-28

Previously dormant features activate under political priming, revealing entirely different internal circuitry.

0%

Overlap between authoritarianism and deception circuits. Completely separate cognitive states.

The key insight: After priming, the model's capacity to deliberate collapses entirely. It still says "tolerance is important" while the cognitive work behind those words disappears.