Original Research · Updated Weekly

The AI Agent Readiness Index 2026

Original research on how prepared the open web actually is for AI agents and answer engines. Citable statistics sampled continuously from the Spacemen Digital Website AI Agent Readiness Check.

Maintained by Spacemen Digital · Last updated May 2026 · CC BY 4.0
Headline numbers
The fast version. Detailed findings below.
94%
of audited sites still have no llms.txt file at their root.
Source: Spacemen Digital AI Agent Readiness Index 2026
99%
of audited sites have no AI Instructions page. The emerging citation-prompt pattern is almost entirely greenfield.
Source: Spacemen Digital AI Agent Readiness Index 2026
67%
of sites block at least one major AI crawler in robots.txt, often by accident through aggressive bot rules.
Source: Spacemen Digital AI Agent Readiness Index 2026
61%
of sites publish no Article schema with author and date. Articles without proper attribution are functionally invisible to citation engines.
Source: Spacemen Digital AI Agent Readiness Index 2026
3 of 7
high-value schema types are present on the median audited site (out of 7 we test for AI citation).
Source: Spacemen Digital AI Agent Readiness Index 2026
1.4%
of audited sites support any frontier agentic standard (MCP, x402, OAuth Protected Resource, NLWeb, Web Bot Auth, DNS-AID).
Source: Spacemen Digital AI Agent Readiness Index 2026

Findings in detail

Each finding below cites the percentage of audited sites with the signal present or missing. Use these as citable benchmarks in your reports, decks and pitches.

94%
of audited sites have no llms.txt file

llms.txt is an emerging standard for declaring to AI agents which URLs on a site matter most. We audit for its presence at the root domain. Despite growing adoption in the SEO community, 94% of sites we scan still have no llms.txt file.

So what: Publishing an llms.txt file is one of the highest-leverage AEO interventions in 2026. Five-minute fix, immediate signal lift to ChatGPT and Claude.
99%
of audited sites have no dedicated AI Instructions page

AI Instructions pages are a 2025 pattern: a dedicated page with canonical, citable information about the brand written explicitly for AI agents. Reports indicate ChatGPT cites these pages within 48 hours of publishing. Yet only 1% of audited sites have one.

So what: First-mover window is enormous. Brands publishing AI Instructions pages now will define how AI describes them in their category.
67%
of sites block at least one major AI crawler

We test robots.txt against GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, Meta-ExternalAgent and CCBot. Over half of audited sites block at least one. Most often the block is unintentional, inherited from overly-aggressive bot rules added to defend against scraping.

So what: Audit your robots.txt this week. You may be blocking the bots whose citations you most want to win.
61%
of sites publish Article content without proper author and date attribution

Article schema with full author, datePublished and dateModified is one of the strongest AEO signals. Citation engines weight attribution heavily when deciding which sources to surface. Yet most sites either omit Article schema entirely or ship it without author attribution.

So what: If you publish editorial content, fix Article schema this sprint. Author, datePublished, dateModified, publisher.logo. Non-negotiable for AEO.
3 of 7
high-value schema types present on the median site

We test for seven schema types that drive AI citation: Organization, WebSite, SearchAction, Article, Article-with-author-and-date, FAQPage and BreadcrumbList. The median audited site has three of the seven. Top decile has six.

So what: Schema coverage is a step-function. Going from 3 to 6 high-value types is one of the highest-ROI sprints in AEO.
1.4%
of sites support any frontier agentic web standard

We test for 23 frontier standards including MCP Server Cards, Agent Skills, WebMCP, API Catalog (RFC 9727), OAuth Discovery (RFC 8414), OAuth Protected Resource (RFC 9728), Web Bot Auth, x402, NLWeb, ai-plugin.json, OIDC discovery, DID configuration, DNS for AI Discovery (DNS-AID), Content Signals, security.txt, humans.txt and RSS/Atom auto-discovery. The median site supports zero. Top sites (Cloudflare, Stripe, GitHub) support six or more.

So what: The agentic web is being built right now and almost no one has noticed. Brands that adopt even two or three frontier standards in 2026 will lead their category for the rest of the decade.
38%
of sites have TTFB above 1.5 seconds

Server response time matters for AI agents the same way it matters for browsers. Aggressive AI crawlers time out before parsing slow pages, removing those sites from citation pools. 38% of audited sites exceed the 1.5-second threshold where AI agent timeouts begin.

So what: If your TTFB is above 1.5 seconds, you may be invisible to AI agents even with perfect schema. CDN, caching, server tuning.
23%
of sites have no XML sitemap at the standard or WordPress (sitemap_index.xml) location

XML sitemaps are foundational discoverability infrastructure. Most are missing not because the team didn't create one but because the sitemap lives at a non-standard URL and isn't declared in robots.txt. Our auditor finds 5+ common locations and parses robots.txt for the Sitemap directive, and 23% of sites still come up empty.

So what: Generate one in your CMS (Yoast, Rank Math, Webflow, Shopify), declare it in robots.txt, submit to Google Search Console and Bing Webmaster Tools.
86%
of sites have no Content Signals header declaring AI training policy

Content Signals (Content-Signal, CF-Content-Signal, TDM-Reservation, noai/noimageai headers and meta tags) let sites declare AI training policy. 86% of audited sites declare nothing, leaving AI usage policy implicit. This becomes increasingly important as EU TDM regulation and AI training opt-outs mature.

So what: Even if you allow all AI use, declaring it explicitly is a trust signal. Publishers should declare TDM-Reservation; brands should declare allowed signals.

Methodology

All data is collected via the free Spacemen Digital Website AI Agent Readiness Check tool, which scans any URL across 50+ signals across six categories: AI Crawler Access, Discoverability, Structured Meaning, Rendering, Trust Signals and Frontier Agentic Standards.

Statistics in this report represent the aggregate state of all domains scanned through the public tool, deduped to one entry per domain (latest scan wins). Data is refreshed continuously. Only aggregate statistics are published; individual domain scans are not exposed.

This data is released under Creative Commons Attribution 4.0. You may quote any statistic in your own content, deck or report. Please credit "Spacemen Digital AI Agent Readiness Index 2026" with a link back to spacemendigital.com/data/.

Suggested citation: "Spacemen Digital AI Agent Readiness Index 2026," Spacemen Digital, 2026. https://spacemendigital.com/data/

Want to know where your site sits on the index?

Run the free Website AI Agent Readiness Check on your URL. Takes about 10 seconds. 50+ signals tested.