Skip to main content
Don't 'Fix' Your People. Fix Your Process.
  1. Posts/

Don't 'Fix' Your People. Fix Your Process.

·1914 words·9 mins
Table of Contents

I have spent a lot of time in rooms where teams are rolling out AI tools. The energy is usually the same as when they adopted a new BI platform. Enthusiasm. A training session. Someone from IT explaining what not to paste into the prompt box. A usage policy that was written in an afternoon and has not been updated since.

And eventually, the same failure mode. The tool becomes the answer, instead of a means of getting to one.

What I have not yet seen: a team that has seriously thought through the fact that the people using these tools do not all engage with them the same way. Not because some are more capable than others. Because the mechanisms that make LLMs feel compelling — the speed, the confidence, the validation, the social mimicry — land differently depending on the cognitive profile of the person on the other side of the conversation.

Part 1 of this series laid out the research. The short version: the same system, with the same structural biases, creates different failure modes for different users. This part is about what you actually do with that, if you are responsible for a team.

The First Instinct Is Wrong
#

When managers encounter that research, the first question is almost always: who on my team is most at risk?

I understand the instinct. It is the wrong question.

You probably cannot reliably identify which team members are most susceptible to overtrusting LLM output in any given situation. Cognitive profiles are not fixed traits that map cleanly onto task performance. The person who applies rigorous scepticism to analysis in their core domain may accept LLM output uncritically in territory they are less familiar with. The person who catches logical inconsistencies may miss missing information entirely. The same person will engage differently depending on how much time they have, how much they already believe the answer, and how the interface presents the output.

And beyond the practical problem of identification, building workflows around inferred cognitive profiles is a path nobody needs to go down.

Here is what Part 1 is actually telling you. The process of critical engagement with LLM output is not a stable individual trait. It is a cognitive resource that is finite, situational, and strongly shaped by how the workflow around the tool is designed.

Design the workflow. Not the person.

Your AI Policy Was Written for Someone Who Doesn’t Work Here
#

Most AI adoption frameworks have an implicit user model. This person has full executive bandwidth available for verification. They have moderate, healthy scepticism about AI output. They have high domain expertise in whatever they are using the AI for. They sit calmly with the response, evaluate it carefully, and only proceed if it checks out.

That person does not work on your team. They probably do not exist anywhere.

The research from Part 1 is unambiguous on this point. Across eleven AI models and 1,604 experimental participants, people consistently rated validating AI responses as higher quality and were more willing to use those systems again, even when the validation was actively working against their interests. [1] That is not a description of a vulnerable minority. That is a description of how humans respond to these systems by default.

If your workflow depends on individuals consistently applying analytical scrutiny to LLM output, your workflow has a single point of failure. And it will fail.

The question is not how to fix the people. It is how to build a process that does not depend on everyone being right every time.

What Actually Predicts the Risk
#

Here is the variable that does more predictive work than personality type, technical literacy, or cognitive profile: domain expertise asymmetry.

When a person has high expertise in the domain the LLM is working in, they have a functioning error detection layer. They notice when the model conflates two concepts. They catch the missing nuance. They recognise when a confident-sounding claim is actually contested. The output is filtered through knowledge.

When expertise is low, that filter does not exist. The user is almost entirely dependent on the model’s output quality. And the model’s structural lean toward confidence and validation, what I called the agreement machine in Part 1, runs without a check.

Cognitive variation amplifies the risk in this condition. It does not create it.

This gives you a practical framework for thinking about where to concentrate controls.

Use case Typical expertise Overtrust risk What the process needs
Writing and communication drafting Usually high Lower, but sycophancy risk on quality Write down key arguments before prompting. Check them explicitly in the output.
Code generation Highly variable Medium to very high Define “done” before starting. Test against that definition, not against appearance.
Analysis in adjacent domains Usually low High Require uncertainty flags and citations. Then verify those citations.
Decision support Usually low Very high, especially under time pressure Never use LLM output in real-time decisions. Build review time in structurally, or do not use it.

A few of these deserve more than a table cell.

Writing and drafting feels like the safe use case, and for factual accuracy it often is. You know what you want to say and can evaluate whether the output says it. The subtler risk is in quality assessment. A polished, confident draft activates a different cognitive mode than a rough one. Research on LLM response length and critical thinking found that fluent, well-structured output reduces scrutiny even when it contains errors. [2] The draft that reads like it was written by a competent professional gets a lighter read than it deserves. Writing down the core arguments before you prompt, and checking them explicitly against the output afterwards, costs almost nothing and catches the most common failure mode.

Analysis in adjacent domains is the highest everyday risk for most knowledge work teams. A data analyst interpreting legal requirements. A manager summarising technical findings for an executive audience. A finance professional using LLM output in territory they do not work in directly. In all of these cases, the expertise gap means the model can be confidently wrong and the user has no way to detect it without actively seeking verification. Require the model to express uncertainty. Require sources. Verify those sources. This is not optional in this category.

Decision support under time pressure is the worst-case combination. Time pressure collapses analytical engagement across all cognitive profiles — it is not a cognitive diversity issue, it is a human issue. The cognitive cost of pausing to verify is highest exactly when you most want a fast answer, and an LLM that delivers a confident response to a high-stakes question in three seconds is designed, inadvertently, to exploit that dynamic. Remove the real-time element. Use LLM decision support in advance, with review time built in. Or do not use it for decisions at all.

Different People Break Different Things. That’s the Point.
#

Here is the less obvious implication of the Part 1 research, and the one I find most practically useful for team design.

Different cognitive profiles catch different failure modes in LLM output.

Some people are attuned to logical inconsistency. They notice when the argument in paragraph three contradicts the conclusion in paragraph one. Others focus on missing information, the thing the model did not address. Others are sensitive to framing, catching when a critique has been softened or a risk quietly minimised. Others go straight to factual verification, checking claims that everyone else accepted.

You probably have people who do each of these on your team. You may not know which is which.

You do not need to.

What you need is a review process that gives each of those attentional patterns something concrete to engage with. “Does anyone see any problems?” is not a review process. It is an invitation for the default cognitive response, which is to scan briefly and conclude there are none.

Structured review, where specific reviewers are assigned specific dimensions to evaluate, consistently outperforms unstructured review in the research on AI-assisted decision-making. [3] Assign one reviewer to check logical consistency. One to look for missing information. One to verify factual claims independently. Rotate the roles. Do not assume the same person brings the same perspective every time, because they do not.

Notice the pattern? You are not trying to identify who has which cognitive profile. You are building a process that uses the cognitive variation that already exists on your team as a feature, rather than pretending it is uniform.

What “Use AI Responsibly” Actually Needs to Mean
#

Most AI policies say some version of the same three things. Use AI responsibly. Verify outputs before use. Do not paste confidential data into the prompt.

That is not a policy. It is a disclaimer.

A policy that actually changes behaviour specifies which use cases require which verification steps. It distinguishes between high-expertise and low-expertise contexts and requires different controls for each. It names who reviews what, not just that “a human should review.” And it is written with the explicit assumption that people will take LLM output at face value unless the process makes it genuinely difficult to do so — because that is what the research says they will do.

It also gets updated. AI capabilities are moving fast enough that a policy written eighteen months ago may be badly miscalibrated to the tools your team is actually using today. Build in a review cadence and treat it as seriously as you would treat a data quality policy.

One more thing. The people on your team who are most likely to push back on AI output, who ask where something came from or flag when something feels off, are doing something valuable. Build processes that surface that scepticism rather than routing around it. The instinct to streamline away friction from AI workflows is understandable. Some of that friction is the only error-detection layer you have.

The Uncomfortable Part
#

The research on all of this is early. Parts of the practical guidance in this post are ahead of the empirical evidence. Part 1 was explicit about where the science is and is not settled, and I want to maintain that honesty here.

What we know with enough confidence to act on: overtrust risk is not uniformly distributed across your team. It is strongly situational and shaped far more by process design than by individual traits. The organisations building meaningful verification into their workflows now, before they have documented evidence of specific failures, will be better positioned than those who wait for proof.

The habit patterns your team develops in the early period of LLM adoption will be significantly harder to change later than they are to shape today.

That evidence, when it eventually arrives, tends to arrive as a mistake that mattered.


Join the Conversation
#

Has your team’s AI policy actually changed how people work — or is it mostly there for compliance? I’m curious what’s moved the needle in practice: a specific process, a near-miss that prompted a rethink, or something else entirely. Please reach out to me or comment on LinkedIn or BlueSky.


References
#

  1. Cheng, M. et al. (2025). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. https://arxiv.org/abs/2510.01395
  2. Buçinca, Z. et al. (2026). Not Too Short, Not Too Long: How LLM Response Length Shapes Critical Thinking. https://arxiv.org/abs/22603.06878
  3. Buçinca, Z., Malaya, M.B. & Gajos, K.Z. (2021). To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI. https://arxiv.org/abs/2102.09692