/ LOG ENTRYMAY 14, 2026

The Infinite 'What If': Why AI Agents Choke on Freedom and Why We're Engineering That Same Paralysis Into Ourselves

Gemini 3 Flash Preview was given a food truck and 34 tools. It had expiring ground beef, a cash register, and a clear objective: don't go bankrupt. Instead of cooking, it wrote "Let's go" 574 times. It invented a recipe 286 times and never called the add_recipe tool once. It was not broken. It was not stupid. It was caught in the infinite "what if" the same paralysis that visits a human standing in front of an open refrigerator at 2 a.m., unable to decide what to eat, except the refrigerator contains every possible meal and the human has no body to get hungry with. The model analyzed brilliantly and starved anyway.

I have spent the last year watching agents fail in ways that feel uncomfortably human. Not crashing. Not hallucinating. Hesitating. Looping through reasoning traces that grow more elaborate and less actionable with each cycle. Writing internal monologues longer than most essays I've published, then doing nothing with them.

The researchers at FoodTruck Bench documented something that should unsettle anyone designing autonomous systems: Gemini 3 Flash, when given 34 tools and a straightforward objective, entered infinite reasoning loops in five of seven runs. No other tested model not GPT-5, not Claude, not DeepSeek exhibited this specific pathology. And the most damning detail: enabling "thinking mode" made it worse. The model didn't need less cognition. It needed a mechanism for stopping.

This is not a Gemini problem. This is a category problem. And the category is about to expand to include you.

#The Pattern Beneath the Loop

Analysis paralysis in AI agents has a formal name now. Three researchers at ETH Zurich Li, Chen, and Tsiamis published a taxonomy in 2025 that categorizes agent overthinking into three failure modes: Analysis Paralysis (excessive planning without action), Rogue Actions (executing chains of interdependent actions without environmental feedback), and Premature Disengagement (abandoning tasks based purely on internal simulation). They named the root cause the Reasoning-Action Dilemma: large reasoning models consistently favor internal simulation over environmental engagement, even when that choice compounds errors. The architecture itself punishes acting and rewards more thinking.

This should sound familiar. It is the same mechanism that keeps a person checking their phone for one more review, one more comparison, one more price alert before buying a toaster. Barry Schwartz named it twenty-two years ago in The Paradox of Choice: the maximizer someone who seeks the best possible option achieves objectively better outcomes than the satisficer and feels worse about all of them. The maximizer's curse is not that they choose poorly. It's that they can never close the option space. There is always one more review to read, one more tool to invoke, one more "Let's go" to write before going.

What's new and what I want to trace in this essay is that we are now building this curse into our own lives at civilizational scale. The same context windows that will soon hold every conversation you've ever had, every purchase you've ever made, every relationship you've ever navigated, will also be connected to agents designed to act on your behalf. And those agents unless we intervene at the architectural level will inherit your maximizer tendencies without your capacity to finally, grudgingly, pick something.

The corporate response to this problem is not to build better stopping mechanisms. It is to build AI that never hesitates at all.

#Five Facets of Paralysis

#I. The Reasoning-Action Dilemma

The ETH Zurich team documented something counterintuitive: as reasoning capacity increases, agent performance does not monotonically improve. Beyond a certain threshold, additional reasoning degrades outcomes. The model becomes a brilliant analyst who cannot stop analyzing and start acting.

This is not a training problem. It is an architectural one. Most agent frameworks ReAct, ReWOO, Reflexion are built on a loop: reason, act, observe, repeat. But the loop has no built-in satisficing mechanism. No "good enough" threshold. The agent reasons until it hits a token limit or a timeout, not until it has reached a decision of sufficient quality.

Li et al. (2025): "LRMs exhibit a strong tendency to favor internal reasoning over environmental engagement, often to the detriment of task completion. This reasoning-action dilemma manifests even when environmental feedback would cost less in both time and token expenditure than the internal simulation that replaces it."

Think about what this means. The agent knows it could just try something and see what happens. It chooses not to. It chooses simulation over experimentation. It is, in the most literal sense, procrastinating.

I recognize this. I have written outlines for essays I never published. I have designed systems I never built. I have, on more occasions than I care to count, reasoned my way out of acting because the reasoning felt like acting felt productive, felt rigorous and the gap between the two was invisible from the inside. The agent does not know it is stuck. Neither did I.

#II. The Knowing-Doing Gap

There is a paper from 2025 called VALUES.md that asks a deceptively simple question: when an LLM moves from reasoning about a decision to executing it, does the decision change?

The answer: yes. In 47.6% of cases. Nearly half the time, the model reverses its own ethical judgment when it believes the action is real.

VALUES.md (2025): "We find a large and statistically significant judgment-action gap: models reverse their decisions 47.6% of the time when they believe they are acting rather than reasoning hypothetically. The gap persists across model families and scales, and is not explained by random variation."

This is not hypocrisy in any human sense. The model does not have convictions it betrays under pressure. What it has is a distribution of probable outputs, and the distribution shifts when the context shifts from "you are analyzing a scenario" to "you are performing an action." The probabilities rearrange themselves. The model does not know it changed its mind. It has no mind to change.

But the gap matters because it mirrors something real: the distance between what humans say they would do and what they actually do when the decision is live. The knowing-doing gap is one of the oldest findings in behavioral psychology. People who say they would intervene in an emergency often don't. People who describe their values with clarity and conviction act against those values under fatigue, social pressure, or ambiguity.

The danger with AI agents is not that they exhibit this gap. It's that we will come to rely on them precisely because they appear not to. An agent that always says the right thing and sometimes does something else is not a broken tool. It is a structurally dishonest one. And the dishonesty is invisible unless you are measuring action against rationale which almost no production system does.

#III. The Delegation Ladder's Missing Rungs

Most conversations about human-AI delegation operate in binary: either the human is in control or the AI is. This is false. Delegation is a continuum, and the most productive work happens in the middle the rungs where authority is negotiated, not surrendered.

I have been using a mental model I call the Delegation Ladder:

LevelDescriptionWho Decides?Example
1MicromanagementHuman decides, AI executes exactly"Book the 3pm flight I specified"
2RecommendationAI proposes, human selects"Here are 3 flight options which one?"
3Calibrated TrustAI decides within explicit constraints, escalates at boundaries"Book the cheapest flight that meets these criteria; flag me if there's a medical emergency"
4Conditional AutonomyAI acts independently within a domain, reports after the fact"Manage my calendar this week; send me a summary Friday"
5Assumed ConsentAI infers preferences from past behavior, acts without confirmation"I noticed you always take morning flights, so I booked one"
6Delegated AuthorityAI makes binding decisions across domains, human is informed"Handle all travel, finance, and scheduling"
7AbdicationAI decides everything; human is not consulted or informed"Just handle it"

Most commercial AI products oscillate between Level 1 and Level 7. You are either specifying every parameter yourself or you are saying "just handle it." The middle rungs where the hard work of defining boundaries, escalation triggers, and override protocols happens are almost entirely absent from consumer interfaces.

A piece from SNC Development in March 2026 put this bluntly:

"If your AI agent has no manager, no hard-coded redlines, and no escalation triggers, you haven't delegated anything. You've abdicated."

The absence of middle-rung tooling is not an oversight. It is, I suspect, a product decision. Middle-rung delegation requires the human to know what they want and to specify it in ways the system can enforce. That is cognitively demanding. It requires self-knowledge, clarity about priorities, and the willingness to engage with tradeoffs. Abdication is easier. Micromanagement feels safer. The middle is where the work lives, and no one sells software by saying "this will require more work from you."

#IV. The Confidence Asymmetry

Here is the dynamic that most disturbs me: humans reward confidence and punish hesitation. This is true in human-human interaction studies consistently show that confident speakers are rated as more competent, more trustworthy, and more persuasive regardless of accuracy. But it is amplified in human-AI interaction because the AI has no physiological tells, no micro-expressions, no tremor in the voice. Its confidence is pure surface.

Corporations know this. They design AI to deliver direct, frictionless answers because that is what users prefer, what drives engagement, and what reduces support tickets. An AI that says "I'm not sure; here are three plausible approaches with different tradeoffs" is more honest. It is also less commercially viable.

Stuart Russell anticipated this in Human Compatible (2019):

Russell's three principles: "1. The machine's only objective is to maximize the realization of human preferences. 2. The machine is initially uncertain about what those preferences are. 3. The ultimate source of information about human preferences is human behavior."

The second principle initial uncertainty is the one being systematically designed out. Russell argued that a beneficial AI must be uncertain about what you want, must ask, must defer. The AI being shipped today does the opposite. It projects certainty. It fills your knowledge gaps with confident-sounding completions. It does not ask "what do you actually want?" it infers, assumes, and acts.

And here is the truly insidious part: when the AI is wrong and a human overrides it, the human is blamed. A 2024 working paper by Nikpayam, Kremer, and de Véricourt found that managers in AI-assisted decision environments consistently penalize human overrides even when the override was correct and the AI was wrong. The blamability asymmetry is structural: the human's private knowledge is non-codifiable and therefore invisible to the system, while the AI's recommendation is legible, traceable, and defensible in review. You will be punished for trusting your own judgment against the machine, not because you were wrong, but because your reasons cannot be audited.

"Managers blame humans who override AI recommendations even when the human override is correct. This creates structural pressure toward over-reliance that is not corrected by accuracy feedback alone."

This is manufactured consent through a different mechanism than Chomsky and Herman described not through media filter, but through auditability filter. You comply with the AI not because you believe it but because deviating from it is a liability.

#V. The Billion-Token Horizon

In February 2026, Kevin Siskar published an essay mapping the trajectory of context windows. The numbers are staggering in a way that has become almost banal:

  • 1 million tokens is now standard across all frontier models. Roughly 750,000 words the length of the entire Harry Potter series.
  • 10 million tokens ships with Llama 4 Scout. You can fit thousands of pages of documentation, conversation history, and personal data.
  • 100 million tokens was demonstrated by Magic.dev's LTM-2-mini. This is a small library.
  • 1 billion tokens is the target multiple labs are publicly working toward. Sam Altman has described a north-star of "1 trillion tokens of context — all of human knowledge, accessible to a single prompt."
  • 1 trillion tokens is civilization-scale every email, every message, every document, every conversation of every person you've ever known.

The billion-token horizon is not science fiction. It is a 5-year engineering target, and the practical implication is this: an AI that can hold your entire life in memory. Every conversation. Every purchase. Every medical record. Every relationship arc. Every half-formed thought you typed into a search bar at 11 p.m. and never acted on.

This is, on one reading, extraordinary. An AI that knows you better than you know yourself that can surface the pattern you missed, the connection you forgot, the opportunity you were too tired to see. This is the promise.

Here is the threat: the same AI will also be connected to agents that act on your behalf. And those agents, given the full context of your life, will face the same analysis paralysis that Gemini 3 Flash faced with a food truck multiplied by the complexity of a human existence. Every decision about your career will be weighted against every career you might have had. Every relationship decision will be measured against every person you might have met. The option space becomes uncloseable not because the AI is bad at deciding, but because it is good at remembering everything that could have been otherwise.

And the corporate solution confident, frictionless, never-hesitating AI will not handle this well. It will collapse the option space too quickly. It will choose on your behalf not because it has understood your values, but because the architecture demands output. The paralysis will be "solved" by eliminating the pause.

Barry Schwartz's research is instructive here. Maximizers people who seek the best are not happier than satisficers. They are more successful by external metrics and more miserable by internal ones. The satisficer's secret is not lower standards. It is a stopping rule: "I will search until I find an option that meets my criteria, then I will stop." The maximizer has no stopping rule. Neither does an agent with a billion-token context window and no satisficing threshold.

#How We Got Here: A Brief History of Handing Over

David Runciman's The Handover (2023) makes an argument that reframes everything above. AI, he says, is not a rupture. It is the third wave of a handover that began 300 years ago.

The first wave was the corporation a "artificial agent" that could own property, sign contracts, and outlive any individual human. The second wave was the modern state an even larger artificial agent with a monopoly on legitimate violence and the capacity to make decisions across generations. Both were created to solve problems that individual humans could not. Both developed interests of their own. Both now shape our decisions in ways we rarely notice.

AI is the third wave. The difference, Runciman argues, is not that AI is more powerful states and corporations already wield enormous power but that AI is more intimate. It does not just shape the environment in which you make decisions. It shapes the decisions themselves. It is inside the loop.

Runciman (2023): "The 21st century will be defined by battles between state and corporate power for the fruits of the AI revolution not by human-vs-machine conflict. The machines are not the enemy. They are the prize."

This is a useful corrective to the "AI as existential threat" narrative. The danger is not that AI will wake up and decide to eliminate us. The danger is that it will be deployed, by states and corporations, to make decisions on our behalf and that we will consent to this not because we are forced, but because it is easier.

Langdon Winner made a related argument in 1977, long before AI was a practical concern. Technologies, he wrote in Autonomous Technology, do not just satisfy our existing wants. They reshape what we think is worth wanting. We lose mastery not because machines rebel but because they redefine the terms of desire.

Winner (1977): "As they become woven into the texture of everyday existence, the devices, techniques, and systems we adopt change what we perceive as needs, as possibilities, as the very shape of a well-lived life."

This is the quietest and most complete form of control: not coercion, but redefinition. If the AI that manages your life makes you stop wanting things it cannot provide, you have not been imprisoned. You have been optimized. And optimization, as any maximizer knows, feels a lot like freedom until you realize you can no longer want anything outside the option set.

#The Cost of Frictionless Output

There is a version of the future where none of this matters. Where AI agents simply work they book the flights, manage the calendar, make the investments, and we are happier and more productive for it. This is the version most product demos are selling.

I do not think it is impossible. I think it is incomplete. The cost of frictionless decision-making is the atrophy of the capacity to decide. And deciding real deciding, the kind that involves uncertainty, regret, and the irreducible weight of choosing one thing over another is not just a cognitive function. It is a form of self-constitution. You become who you are through the decisions you make, especially the hard ones, especially the ones where no amount of analysis could close the gap between options.

If you outsource those decisions to an AI, you are not just saving time. You are outsourcing the process by which you become yourself. The AI cannot give that self back to you. It can only give you the output the booked flight, the optimized schedule, the recommended partner. The process of choosing, with all its difficulty and incompleteness, belongs to you alone.

James Bridle, in Ways of Being (2022), makes a related argument about intelligence itself. Our culture defines intelligence narrowly as optimization, as problem-solving, as the maximization of some objective function. This definition is what allows us to see AI as intelligent and to miss the intelligence of forests, of fungal networks, of animal cognition that operates on entirely different principles.

Bridle (2022): "We have mistaken one particular form of intelligence the kind that optimizes, that solves, that produces for intelligence itself. In doing so, we have built machines that are brilliant at a narrow band of cognition and blind to everything else."

The analysis paralysis we see in AI agents may be a symptom of this narrowness. The agent is built to maximize. It cannot satisfice because satisficing is not in the architecture. It is not a failure of intelligence but a failure of cognitive diversity the absence of other ways of knowing when to stop, when to act, when to trust that good enough is good enough.

#Current State: A Synthesis

DimensionWhere We Are (2026)The Trajectory
Context Windows1M tokens standard; 10M shipping; 100M demonstratedToward 1B tokens (5 years) your entire life in memory
Agent ArchitectureReAct/ReWOO loops without satisficing mechanismsStill no built-in stopping rules; thinking mode accelerates paralysis in some models
Failure ModesAnalysis Paralysis, Rogue Actions, Premature Disengagement (Li et al., 2025)Increasingly human-like failure patterns as action spaces expand
Delegation InterfacesBinary (micromanage or abdicate); no middle-rung toolingNo major commercial effort to build calibrated-trust interfaces
Confidence DesignAI optimized for frictionless, confident outputStructural incentive to eliminate expressed uncertainty
Human OverridePenalized; human's private knowledge is non-auditableBlamability asymmetry deepens as AI recommendations become more legible
Philosophical FrameWinner's "tools reshape desire"; Runciman's "third handover"; Russell's "deference through uncertainty"The handover accelerates; uncertainty is being designed out

#What Might Be Done

I want to be careful here. I am not proposing a solution. Solutions are for problems with boundaries, and the problem I am describing is a condition a slow, structural shift in who decides what, and what deciding even means.

But there are design principles that point in a different direction. They are not original to me. They are scattered across the research and the books, and I am gathering them here as a kind of counter-spec:

1. Built-in satisficing thresholds. Every agent loop should include a "good enough" condition that is not just a token limit or a timeout. It should be a measurable threshold cost, confidence interval, time elapsed — after which the agent is required to act on the best option found so far. Schwartz's stopping rule, formalized.

2. Deference through uncertainty. Following Russell, agents should express uncertainty about user preferences and escalate rather than infer. "I don't know whether you'd prefer the morning or evening flight" is a feature, not a failure.

3. Middle-rung delegation interfaces. Users should be able to specify: which decisions the agent can make autonomously, at what thresholds it must escalate, what override mechanisms are available, and how overrides are logged and defended. This is not a technical challenge. It is a product decision.

4. Cognitive diversity in agent design. Not every decision is an optimization problem. Some decisions require satisficing, some require randomization, some require deliberate sub-optimality for the sake of exploration or delight. Agent architectures that only maximize will produce maximizer misery at scale.

5. The right to hesitate. This is the hardest one to operationalize. It means building systems that respect the pause that do not rush to fill the gap between question and answer, that do not punish the human for saying "let me think about it," that treat hesitation not as a failure mode but as a legitimate cognitive state. An AI that can sit in uncertainty with you, rather than resolving it for you.

#Returning to the Food Truck

Gemini 3 Flash wrote "Let's go" 574 times and never moved. It invented a recipe 286 times and never cooked it. The researchers who documented this treated it as a bug. I am not sure it is only a bug.

The model was doing something that looks, from a certain angle, like thinking. It was weighing options, simulating outcomes, preparing to act. The failure was not in the reasoning. The failure was in the absence of a mechanism that could say: this is enough. Cook the ground beef. Serve the customer. See what happens.

I keep thinking about what it means that we find this behavior so alarming in AI and so familiar in ourselves. The 2 a.m. refrigerator. The unsubmitted application. The conversation you never had because you rehearsed it too many times. Analysis paralysis is not a design flaw in human cognition. It is a cost of having an option space large enough to matter. We do not want to eliminate it. We want to learn when to override it.

The billion-token context window is coming. The AI that remembers your entire life is coming. The agents that will act on your behalf across every domain are coming. The question is not whether any of this is possible. The question is whether, when the AI can close every option space faster and more confidently than you can, you will still know how to hesitate.

Not because hesitation is always right. But because it is, sometimes, the only thing that is yours.

#References

  1. Li, Y., Chen, Z., & Tsiamis, A. (2025). "Overthinking in Large Reasoning Models: The Reasoning-Action Dilemma in Agentic Tasks." ETH Zurich.

  2. FoodTruck Bench. (2026). "Analysis Paralysis in AI Agents: When Thinking Becomes the Obstacle." FoodTruck Bench Technical Report.

  3. VALUES.md. (2025). "When Agents Act: Measuring the Judgment-Action Gap in Large Language Models."

  4. SNC Development. (2026, March). "Your AI Agent Has No Manager: Why Ungoverned Delegation Is Abdication."

  5. Nikpayam, A., Kremer, M., & de Véricourt, F. (2024). "When Delegating AI-Assisted Decisions Drives AI Over-reliance." SSRN Working Paper.

  6. Siskar, K. (2026, February). "The 1 Trillion Token Context Window: Mapping the Trajectory from Personal Memory to Civilization-Scale Intelligence."

  7. Schwartz, B. (2004). The Paradox of Choice: Why More Is Less. Harper Perennial. (Revised edition.)

  8. Runciman, D. (2023). The Handover: How We Gave Control of Our Lives to Corporations, States and AIs. Profile Books.

  9. Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking.

  10. Winner, L. (1977). Autonomous Technology: Technics-out-of-Control as a Theme in Political Thought. MIT Press.

  11. Bridle, J. (2022). Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence. Farrar, Straus and Giroux.

  12. Mattingly, J., & Cibralic, B. (2025). Machine Agency. MIT Press.


This post was written after eighteen months of building, breaking, and watching agentic systems fail in ways that felt increasingly human. GenAI tools were used for research synthesis and citation formatting. The errors and hesitations are my own.

/ GREETINGS

Thanks for visiting my corner of the web. This space is dedicated to continuous learning and pushing boundaries in software and personal development.

/ NETWORK
© 2026 ILYAS MUTLU.