Amazon shut down an internal AI usage leaderboard after employees started assigning AI agents to pointless tasks just to climb the rankings. The Financial Times reported the story in late May 2026. It is the most honest snapshot of enterprise AI adoption you will read this year.

The leaderboard was meant to drive AI adoption across Amazon’s vast organization. Instead, it became a case study in what happens when you optimize for the wrong metric. Teams competed on volume of AI agent calls, not on whether those calls produced anything useful. The result was a wave of manufactured AI tasks: agents scheduling meetings that did not need scheduling, summarizing documents nobody would read, and generating reports that would be filed and forgotten.

Amazon’s response was to kill the leaderboard entirely. The right move. The wrong lesson would have been to tweak the scoring. The real problem was not the leaderboard mechanics. It was the assumption that more AI usage equals better AI adoption.

The Performative AI Adoption Trap

This is not an Amazon problem. It is an industry problem. And it maps directly onto a mistake brands are making right now in Generative Engine Optimization.

The pattern is identical. Organizations measure what is easy to count rather than what matters. In enterprise AI, the easy metric is volume: how many AI tasks did your team run this quarter? How many agents did you deploy? How many workflows did you automate?

In GEO, the easy metric is also volume: how many AI-generated pages did you publish? How many AI citations did you get? How many times does your brand appear in ChatGPT responses?

Both metrics are traps.

Consider what happened inside Amazon. The leaderboard existed because leadership wanted to accelerate AI adoption. That is a reasonable goal. But the measurement system rewarded the wrong behavior. Teams optimized for the metric, not the outcome. They found the shortest path to a higher score, which turned out to be manufacturing fake AI work rather than finding genuine AI applications.

Now consider what happens when a brand optimizes for AI citation volume. They publish hundreds of AI-generated articles targeting long-tail keywords. They create dozens of pages optimized for entity extraction. They chase mentions across platforms. And the citation count goes up.

But citation quality does not. The AI engines mention the brand, but in low-intent, low-commercial contexts where the mention carries no weight. “Brand X is one of many companies in the CRM space” is technically a citation. It is also worthless. It does not drive a recommendation. It does not influence a purchase. It exists on the AI leaderboard of your brand’s vanity metrics and nowhere else.

Why Volume Metrics Fail in Both Enterprise AI and GEO

The Amazon story reveals three structural problems with volume-based AI metrics that apply equally to enterprise adoption and to brand AI visibility strategies.

1. Volume rewards gaming, not quality

Any metric that counts occurrences without weighting outcomes will be gamed. This is Econ 101. Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. Amazon employees gamed the leaderboard by creating AI tasks that generated volume without value. Brands game citation metrics by publishing thin content that generates mentions without authority.

The solution is not better monitoring. Amazon could have audited the leaderboard, flagged suspicious activity, and punished offenders. But that is an arms race. The solution is to change the metric entirely. Measure outcomes, not activity.

For GEO, this means measuring whether AI engines recommend your brand in high-intent commercial queries, not whether they mention you at all. “What is the best project management software?” is a high-intent query where a recommendation drives revenue. “List of project management companies” is a low-intent query where a mention drives nothing.

2. Volume obscures signal

When Amazon teams ran thousands of AI agent tasks, the signal of genuine, productive AI usage got buried in noise. The same thing happens to brands that flood the web with AI-generated content. Searchless.ai’s citation tracking data shows that brands with more than 200 AI-generated pages often see lower per-page citation rates than brands with 30 well-crafted pages. The volume dilutes authority.

AI engines evaluate entity authority through consistency and corroboration, not through sheer mention count. When a brand is mentioned across 50 high-quality sources in consistent contexts, that builds authority. When a brand is mentioned across 500 low-quality pages in inconsistent contexts, that creates noise. The AI model does not know which mentions to trust, so it trusts fewer of them overall.

The Amazon parallel is exact. When every team runs hundreds of AI tasks, leadership cannot distinguish teams that found genuine AI breakthroughs from teams that manufactured activity. The signal disappears. The leaderboard becomes useless as a decision-making tool.

3. Volume creates a false sense of progress

This is the most dangerous failure mode. Amazon’s AI adoption leaderboard probably looked incredible right up until they killed it. Rising numbers, increasing engagement, broad participation. The metrics told a story of successful transformation. The reality was a story of organizational theater.

Brands chasing AI citation volume experience the same illusion. Monthly citation counts rise. Reports look positive. But behind the numbers, the brand’s position in high-intent AI recommendations may be stagnant or declining. The competitor that invested in 20 authoritative pages and 10 strategic backlinks is winning the recommendations that matter while the volume-chasing brand celebrates its leaderboard position.

Searchless.ai data from Q1 2026 shows that the top 3% of brands by AI citation quality capture over 80% of AI-driven commercial recommendations. Meanwhile, brands in the long tail of citation volume see almost no commercial impact. Volume is not a proxy for quality. It is often inversely correlated with it.

What Amazon Got Right

To be clear about what Amazon did correctly: they recognized the problem and they killed the metric. They did not try to fix the leaderboard. They did not add anti-gaming rules or audit committees. They removed the incentive structure that was producing the wrong behavior.

This is the correct response. When a metric produces perverse incentives, the answer is not better monitoring. The answer is a different metric.

For brands investing in GEO, the lesson is to audit your measurement stack. If you are tracking AI citation volume as a primary KPI, you are running your own version of Amazon’s leaderboard. You are measuring what is easy to count rather than what drives outcomes.

The Right Metrics for AI Visibility

If volume is the wrong metric, what should brands measure? Three things:

Share of Model in commercial queries

Share of Model measures how often AI engines recommend your brand when users ask high-intent questions related to your products or services. It is the AI visibility equivalent of market share. A brand with 15% Share of Model in its category appears in roughly 15% of relevant AI recommendations.

This is the single most important GEO metric because it directly connects AI visibility to commercial opportunity. When someone asks ChatGPT “What is the best CRM for small business?” and your brand appears in the recommendation, that is a Share of Model data point. Track it across your top 50 commercial queries and you have a meaningful picture of your AI visibility position.

Citation quality score

Not all citations are equal. A recommendation in a commercial query is worth more than a mention in an informational query. A citation that includes specific product details is worth more than a generic brand name drop. A mention that appears alongside positive sentiment is worth more than a neutral listing.

Citation quality scoring weights these factors. It turns a raw count into a meaningful assessment of how AI engines position your brand. Two brands with identical citation counts can have vastly different quality scores, and the quality score predicts commercial impact far better than the raw count.

Recommendation conversion

The ultimate metric. When AI engines recommend your brand, does anything happen? Track AI referral traffic, branded search lift following AI recommendation events, and conversion rates from AI-referred visitors. If your Share of Model is growing but your AI referral traffic is flat, something is broken in the funnel.

This is the metric Amazon should have used for its AI adoption push. Not “how many AI tasks did your team run?” but “what measurable outcome did AI produce for your team this quarter?” The teams finding genuine AI applications would have risen to the top. The teams manufacturing activity would have been exposed.

The Bigger Picture: Enterprise AI Adoption Is a Measurement Problem

The Amazon leaderboard story is not really about Amazon. It is about the fact that most organizations have no idea how to measure whether their AI investments are working.

A McKinsey survey from late 2025 found that 78% of organizations report using AI in at least one business function. That sounds impressive. But only 22% reported achieving meaningful bottom-line impact from those AI deployments. The gap between adoption numbers and outcome numbers is enormous, and it exists because organizations are measuring adoption instead of impact.

The same dynamic is now playing out in GEO. Brands report that they are “optimizing for AI search.” They have llms.txt files, they publish entity-optimized content, they track AI citations. But when you ask them what commercial impact those efforts have produced, the numbers get vague fast.

This is the Amazon leaderboard problem in brand strategy clothing. The activity looks good. The metrics look good. But the outcomes are missing because the measurement system was designed to count activity, not to assess results.

What to Do Next

If you are investing in AI visibility, here is a practical framework that avoids the performative adoption trap.

Audit your current metrics. Are you tracking citation volume or citation quality? Are you counting AI mentions or measuring Share of Model in commercial queries? If your dashboard shows rising numbers but you cannot connect those numbers to revenue outcomes, you have a measurement problem.

Define your high-intent query set. Identify the 30 to 50 questions your potential customers are most likely to ask AI engines when they are considering a purchase. These are your commercial queries. Track your brand’s appearance in AI responses to these specific questions.

Measure before you optimize. Before publishing more content or building more backlinks, establish your baseline Share of Model. Know where you stand. Then measure whether your optimizations move the number that matters.

Weight quality over quantity. One authoritative article that gets cited by ChatGPT in a commercial recommendation is worth more than 50 thin pages that generate generic mentions. Invest in fewer, better content assets.

Track the full funnel. AI visibility is not the end goal. Commercial outcomes are. Track AI referral traffic, branded search volume, and conversion from AI-referred visitors. If visibility is growing but outcomes are flat, your visibility is in the wrong places.

The Amazon Lesson, Distilled

Amazon built an AI adoption leaderboard to accelerate transformation. It accidentally incentivized performative adoption. Teams optimized for the metric rather than the mission. The signal got buried in noise. Leadership killed the leaderboard because it was producing the wrong behavior at scale.

Your brand’s AI visibility strategy is vulnerable to the same trap. Citation counts, mention volumes, and page counts are the GEO equivalent of Amazon’s leaderboard. They measure activity, not outcomes. They reward volume, not quality. They create an illusion of progress while the metrics that actually matter stay flat.

The brands that win in AI search will be the ones that measure what matters: Share of Model in commercial queries, citation quality, and recommendation conversion. Everything else is leaderboard theater.


FAQ

Why did Amazon kill its AI leaderboard? Amazon shut down an internal leaderboard that ranked teams by AI agent usage after employees began assigning AI agents to pointless tasks solely to climb the rankings. The leaderboard incentivized volume over value, turning genuine AI adoption into a performative exercise.

What is performative AI adoption? Performative AI adoption happens when organizations use AI tools primarily to appear AI-forward rather than to solve real problems. Metrics like “number of AI tasks completed” or “AI agent calls per team” reward activity, not outcomes.

How does this relate to GEO and AI visibility? The same dynamic plays out in brand AI strategy. Companies publish AI-generated content at scale, chase AI citation volume, and optimize for AI mentions without asking whether the underlying content is actually authoritative. Volume metrics are seductive traps in both enterprise AI adoption and Generative Engine Optimization.

What should companies measure instead of AI usage volume? Outcome-based metrics: AI-assisted revenue, time saved on specific workflows, customer satisfaction improvements tied to AI features, and for GEO, whether AI engines actually recommend your brand when customers ask relevant questions. One meaningful AI citation outperforms a hundred meaningless AI-generated pages.

How common is performative AI adoption in enterprises? Very common. A McKinsey survey found that while 78% of organizations report using AI in some capacity, only 22% have moved beyond pilot programs to achieve measurable business impact. The gap between reported adoption and real outcomes is where performative metrics thrive.


Check your brand’s AI visibility in 60 seconds. Get a free AI Visibility Score at audit.searchless.ai and see whether AI engines recommend you or your competitors.