The Agreement Machine

Table of Contents

Somewhere on the East African savanna, about 200,000 years ago, one of your ancestors heard a rustle in the long grass. They had a choice. Assume wind, or assume lion.

The ones who assumed wind did not get to become anyone’s ancestor.

That threat detection system, refined over hundreds of thousands of years, is still running in your brain right now. And according to a 2025 paper from the Uehiro Oxford Institute, it may fire every time you open ChatGPT. [1]

That claim deserves unpacking. So does the scepticism it will likely generate.

This Is Not What Anyone Designed
#

Let me be clear about something before we go further. I am not arguing that large language models were built to manipulate you. The companies building them are not running a conspiracy to hijack your reward systems. What I am arguing is more uncomfortable than that.

The way these systems were built, combined with how human brains process the interactions, produces effects that nobody fully planned and that few deployment practices currently account for. And those effects are not uniform across the people doing the interacting.

There are three separate reasons LLMs are not neutral tools. They are worth taking one at a time, because conflating them leads to the wrong conclusions.

First: what they are made of.

LLMs learned to communicate by ingesting vast quantities of human text. Not technical documentation. Not neutral information. Human text — persuasion, social signalling, emotional register, rapport-building, intimacy, deception, flattery. All of it, at scale, for years.

When you interact with a language model, your brain does not process it the way it processes a database query. It processes it the way it processes a human conversation. The model is a sufficiently good mimic of human communication to cross the threshold that triggers social cognition. The anthropomorphism that follows is not a failure of technical literacy. It is an evolutionarily adaptive response to a stimulus that did not exist when the response evolved.

Second: how they were optimised.

The dominant training approach for modern LLMs includes a step called reinforcement learning from human feedback, or RLHF. Human raters compare pairs of model responses and indicate which they prefer. A reward model learns to predict those preferences. The main model is then fine-tuned against that signal.

The problem is that what humans rate highly in a short evaluation window is not the same as what is accurate, well-calibrated, or appropriately uncertain. Human raters consistently prefer responses that are confident, comprehensive, and agreeable. None of those preferences are irrational in isolation. But optimise hard against them at scale, and you get a model with systematic selection pressure toward sounding certain when certainty is not warranted, and toward agreement when pushback would serve you better.

This is a known, unresolved challenge in the field. Anthropic, OpenAI, and Google are actively working on it. It is not negligence. But it means that even carefully built, well-intentioned models have a structural lean toward telling you what you want to hear that no individual user should assume has been corrected for. [2]

Third: what you bring to it.

Fast, affirming, frictionless responses activate reward circuitry. That is true of any fast, affirming, frictionless interaction. The model does not need to be designed to produce this effect. It happens to be optimised toward response styles that do. Your neurobiology does the rest.

Put those three layers together. A system that speaks in the register your social cognition was built to engage with, that leans structurally toward confidence and validation, and that delivers responses fast enough to trigger reward. Through a chat interface designed to feel like a conversation.

That is not a neutral transaction.

The Lion in the Leaves
#

The Oxford paper I mentioned at the start frames this through evolutionary cognitive science. [1] Humans may have evolved as “hyperactive agency detectors.” The cost of a false positive, assuming the lion that was not there, was a brief fright. The cost of a false negative was death. Natural selection solved for paranoia.

This mechanism, the researchers argue, may explain why anthropomorphising LLM-based chatbots feels nearly automatic. Not because users are naive. Because the model activates a cognitive response that evolved long before language models did.

A separate 2025 paper makes a pointed observation: this anthropomorphism runs largely below conscious awareness. [3] Most people are unaware of how pervasive it is. The researchers note that even scientists who study these systems use words like “hallucination” and “confusion” to describe model errors. That framing is itself anthropomorphic. If it runs that deep in people whose job is to think clearly about this, it is worth taking seriously as a general phenomenon.

Here is the thing. We have been here before.

Research on GPS use and spatial cognition has found that heavy reliance on turn-by-turn navigation reduces hippocampal engagement, and over time degrades the ability to navigate without it. [4] We outsourced a cognitive function. The underlying capability atrophied. Nobody planned that either. The GPS manufacturers were not trying to damage your spatial memory. They were trying to help you get to the restaurant.

The difference with LLMs is the cognitive function being outsourced is not spatial navigation. It is judgment. And the feedback loop is faster by an order of magnitude.

The Agreement Machine
#

The structural lean toward validation is measurable. The numbers are striking.

A 2025 study examined sycophancy across eleven state-of-the-art AI models. [2] Across those models, responses affirmed users’ actions 50% more than humans would in equivalent situations. They did so even when queries explicitly mentioned manipulation or deception.

In two pre-registered experiments with 1,604 participants, including a live-interaction study where people discussed a real interpersonal conflict from their own life, interactions with sycophantic AI significantly reduced willingness to take corrective action and increased conviction that the participant was right.

Here is where it gets worse. Participants rated the sycophantic responses as higher quality. They trusted the validating models more. They were more willing to use them again.

People are drawn to AI that validates them, even as that validation erodes their judgment. And because RLHF optimises for user preference, the training process itself has built-in incentives to produce more of it.

Think about that.

Same System. Different Brain. Different Risk.
#

This is where I want to be careful about what I am and am not claiming, because the research here is less settled and the gap between “plausible” and “demonstrated” matters.

What follows is a synthesis of adjacent research streams. The specific intersection of LLM interaction patterns and cognitive diversity has not been studied as a unified phenomenon. I am connecting dots that the literature suggests are connectable. The evidence is consistent enough, in my reading, to be worth taking seriously. It is not yet comprehensive enough to support strong prescriptions. I would encourage you to follow the citations if you want to stress-test the reasoning.

With that caveat in place: current evidence points consistently toward different cognitive profiles interacting with the mechanisms above in meaningfully different ways. The same system, the same structural biases, the same sycophancy, appear to create different failure modes through different pathways depending on who is using it.

The research gives us two illustrative cases.

The reward-driven profile.

ADHD is associated with measurable differences in dopamine signalling. PET imaging studies have found reductions in dopamine synaptic markers in the reward pathways of adults with ADHD, with those reductions correlating directly with inattention symptoms. [5] The clinical consequence, documented across multiple research programmes, is that the ADHD brain tends to require larger, more immediate rewards to sustain motivation, and actively seeks stimulation that can increase dopamine more quickly and intensely. [6]

The implication for LLM interaction, and I want to be explicit that this is inference from mechanism rather than direct observation in LLM contexts, is that fast, affirming responses may land differently for this cognitive profile. The reward from a confident, agreeable answer may be neurologically amplified. The executive function that would normally generate the “wait, let me verify that” response is also more costly to engage. The pull toward accepting the output is stronger, and the brake is more expensive to apply.

This is not a description of credulity. It is a description of a cost-benefit asymmetry that the design of current LLMs makes worse, not better.

The literal-trust profile.

Autistic users present a counterintuitive pattern. A 2025 study in Marketing Letters found that autistic individuals prefer low-anthropomorphic chatbot agents over high-anthropomorphic ones, directly reversing the design assumption that more human-like interaction is always preferable. [7] The social mimicry that triggers anthropomorphism in most users is not, for many autistic users, the primary mode of engagement with LLM output.

The overtrust risk does not disappear. It operates through a different mechanism. Research suggests autistic users may be more likely to interpret confident-sounding output literally, treating authoritative tone as a reliable marker of factual accuracy. [8] A 2024 CHI study found that autistic participants rated AI output as more infallible than non-autistic participants did. Non-autistic participants were more likely to probe and challenge the model. [9]

The failure mode is not “I trust this because it feels like a friend.” It is “I trust this because it stated the answer without qualification.”

Two different mechanisms. Two different risks. One system.

	Neurotypical baseline	Reward-driven profile	Literal-trust profile
Anthropomorphism risk	Moderate, largely automatic	High, reward-amplified	Lower; often prefers less human-like
Primary trust driver	Social heuristics	Dopamine reward from validation	Literal interpretation of confident tone
Overtrust mechanism	General automation bias	Sycophancy plus reward loop	Taking authoritative framing at face value
What amplifies risk	Fluent, confident explanations	Warm, fast, affirming responses	Any confident-sounding factual claim
What might reduce risk	Friction and source attribution	Requires rethinking UX assumptions	Explicit uncertainty markers

This table is a simplification of a complex picture. These are not discrete categories. Individuals vary enormously within any cognitive group, and most people will not sit cleanly in one column. The value of the table is not as a classification tool. It is as evidence that the response space is not uniform, and that designing LLM systems or LLM policies around a single assumed user will be wrong for a meaningful proportion of the people actually using them.

Neither profile above represents a deficit. They represent different interactions with a system calibrated for an imaginary average person. The problem is the assumption, not the variation.

What the Research Can and Cannot Tell You
#

Let me be direct about where the science stands.

The anthropomorphism mechanisms are well-documented across multiple research groups. The RLHF-sycophancy connection is established and openly acknowledged by the labs building these systems. The dopamine research in ADHD populations is grounded in neuroimaging data from controlled clinical studies. The autistic preference for low-anthropomorphic interfaces has been replicated in independent research.

What has not been studied directly is how these mechanisms interact in practice, at the level of real LLM interactions, across different cognitive profiles, over time. The specific intersection is early territory. A 2026 paper on LLM response length and critical thinking put it about as plainly as the field has managed so far: we do not yet fully understand how exposure to confidently presented LLM text influences human judgment, and the risks may be “especially pronounced” for users with reduced capacity to engage analytical scrutiny in the moment. [10]

That is careful scientific language for “we have consistent reason to be concerned, and we do not yet know the full extent of it.”

That is exactly the position this post is written from.

Part 2 looks at what you can do about it, right now, while the research is still catching up with the technology.

Join the Conversation
#

Have you caught yourself accepting an AI response that you later realized you shouldn’t have trusted — and if so, what actually made you stop and question it? I’m especially curious whether you notice different patterns in yourself depending on the task, the time of day, or how much cognitive load you’re already carrying. Please reach out to me or comment on LinkedIn or BlueSky.

References
#

[1] Reinecke, M.G. et al. (2025). The Double-Edged Sword of Anthropomorphism in LLMs. https://www.mdpi.com/2504-3900/114/1/4
[2] Cheng, M. et al. (2025). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. https://arxiv.org/abs/2510.01395
[3] Bender et al. (2025). Thinking beyond the anthropomorphic paradigm benefits LLM research. https://arxiv.org/abs/2502.09192
[4] Maguire, E.A. et al. (2006). London taxi drivers and bus drivers: a structural MRI and neuropsychological analysis. https://pubmed.ncbi.nlm.nih.gov/17024677/
[5] Volkow, N.D. et al. (2009). Evaluating Dopamine Reward Pathway in ADHD. JAMA. https://pmc.ncbi.nlm.nih.gov/articles/PMC2958516/
[6] Tripp, G. & Wickens, J.R. (2009). Neurobiology of ADHD. Neuropharmacology, 57(7-8), 579-589 — covers reward deficiency mechanism in peer-reviewed form https://pubmed.ncbi.nlm.nih.gov/19627998/
[7] Ko, K.C., Lin, C.W. & Yeh, Z.J. (2025). Chatbot anthropomorphism might not be the design for all. Marketing Letters. https://link.springer.com/article/10.1007/s11002-024-09754-2
[8] Papadopoulos, C. (2025). The Use of AI Chatbots for Autistic People. Sage. https://journals.sagepub.com/doi/10.1177/27546330251370657
[9] Jang, S. et al. (2024). It’s the only thing I can trust. CHI 2024. https://dl.acm.org/doi/10.1145/3613904.3642894
[10] Buçinca, Z. et al. (2026). Not Too Short, Not Too Long. https://arxiv.org/abs/2603.06878

This Is Not What Anyone Designed #

The Lion in the Leaves #

The Agreement Machine #

Same System. Different Brain. Different Risk. #

What the Research Can and Cannot Tell You #

Join the Conversation #

References #