SEO · AEO · Agentic Web

Glossary of SEO, AEO & AI Search terms

Canonical, citable definitions of the terms that matter in search visibility today — from old-school technical SEO to the bleeding edge of AI agent protocols. Maintained by Spacemen Digital.

Jump to a term
AI Search & Answer Engines 8 terms
Answer Engine Optimization is the practice of optimizing a website's content, structured data and infrastructure so the brand is cited by AI answer engines such as ChatGPT, Claude, Perplexity, Google AI Overviews, Microsoft Copilot and Grok. AEO overlaps significantly with traditional SEO but adds new mechanics including llms.txt files, AI Instructions pages, entity strengthening on Wikipedia and Wikidata, and answer-pattern content engineering.
Spacemen Digital's position: AEO and SEO should be run as one retainer, not two service lines. We deliver AEO at spacemendigital.com/aeo-agency.
Read the full Q&A
Generative Engine Optimization is largely interchangeable with AEO. The term emphasizes the generative aspect of answer engines (the model writing an original answer that cites sources) while AEO emphasizes the citation-extraction aspect. Both refer to the discipline of making a brand discoverable and citable by LLM-powered search interfaces.
Spacemen Digital's position: We use AEO and GEO interchangeably. Our dedicated GEO page is at spacemendigital.com/generative-engine-optimisation-agency.
Read the full Q&A
AI visibility is the umbrella term for how often and how favorably a brand appears in AI-generated answers. It encompasses citation frequency, share of voice against competitors, brand sentiment, and the accuracy of how the brand is described. AI visibility is the outcome metric AEO and GEO work to improve.
Spacemen Digital's position: AI visibility cannot be improved without combining technical infrastructure (schema, llms.txt, AI Instructions pages), entity strengthening (Wikipedia, Wikidata) and answer-pattern content. We measure this with our Website AI Agent Readiness Check.
Read the full Q&A
An LLM citation is a reference in an AI-generated answer to a specific source URL. Citations appear as inline links in ChatGPT, Perplexity, Claude with web access, Google AI Overviews and Copilot. Earning LLM citations requires both technical readiness (structured data, llms.txt, server-side rendering) and content patterns the models extract reliably.
Read the full Q&A
Citation harvesting is the practice of strategically placing brand mentions on third-party platforms that LLMs sample heavily during training and retrieval. These platforms include Reddit, Quora, Stack Exchange, Hacker News, GitHub and topical community sites. Citation harvesting is a core lever in AEO because LLM training data is biased toward these high-quality user-generated sources.
Spacemen Digital's position: Citation harvesting is part of every AEO retainer. Our dedicated service page is at spacemendigital.com/citation-optimization.
Google
AI Overviews are Google's generative search responses that appear above traditional blue-link results for many queries. They cite source URLs and pull content from indexed pages. Ranking in AI Overviews is a primary objective of AEO work because the Overview frequently displaces clicks from the organic results below it.
Share of Voice in an AEO context is the percentage of AI-generated responses in a topic that mention or cite a specific brand. SOV is measured by sampling a representative set of prompts across ChatGPT, Claude, Perplexity, Google AI Overviews and Copilot, then counting brand appearances. It is the primary KPI of AEO programs.
Read the full Q&A
Prompt tracking is the systematic monitoring of how a brand appears across a curated set of AI prompts over time. It involves sampling ChatGPT, Claude, Perplexity and other answer engines repeatedly with the same prompts, then logging mentions, citations, sentiment and competitor presence. Prompt tracking is the answer engine equivalent of rank tracking in SEO.
Spacemen Digital's position: We offer prompt tracking as a productized service at spacemendigital.com/prompt-tracking.
Read the full Q&A
Standards & Formats 9 terms
llms.txt is an emerging standard for declaring to AI agents which URLs on a site contain the most important content for them to read. It is conceptually similar to robots.txt and sitemap.xml but written in clean markdown with one URL per line under topical headings. The file sits at the root of the domain (example.com/llms.txt).
Spacemen Digital's position: Every brand under our retainer publishes an llms.txt. See ours at spacemendigital.com/llms.txt.
Read the full Q&A
llms-full.txt is the long-form companion to llms.txt. While llms.txt is a structured index of URLs, llms-full.txt contains the full canonical content an AI agent might need to write accurately about a brand or product. It is typically 3,000 to 10,000 words of clean markdown and sits at the root of the domain alongside llms.txt.
Spacemen Digital's position: llms-full.txt is the highest-leverage page on a site for AEO. See ours at spacemendigital.com/llms-full.txt.
Read the full Q&A
An AI Instructions page is a dedicated page on a website containing canonical, citable information about the brand written explicitly for AI agents. The pattern emerged in 2025 when agencies including Seer Interactive began publishing pages that ChatGPT and other agents would read and reproduce verbatim. Brands publishing them have reported ChatGPT citing the page within 48 hours of going live.
Spacemen Digital's position: AI Instructions pages are one of the highest-leverage AEO assets in 2026. See ours at spacemendigital.com/ai-instructions.
Read the full Q&A
Schema.org is a collaborative vocabulary maintained by Google, Microsoft, Yahoo and Yandex for marking up structured data on web pages. It provides standard types (Organization, Article, FAQPage, Product, BreadcrumbList, WebSite, Person, LocalBusiness) and properties used by search engines, AI agents and content syndicators to extract meaning from pages.
Read the full Q&A
JSON-LD is the preferred format for embedding Schema.org structured data on web pages. It is JSON-formatted Linked Data placed inside a script tag in the page head or body. Google, Bing and AI agents prefer JSON-LD over alternative formats (Microdata, RDFa) because it does not require interleaving markup with visible content.
Read the full Q&A
FAQPage is a Schema.org type used to mark up frequently-asked-question content on a page. Each Question child contains a Question.name (the question text) and acceptedAnswer.text (the canonical answer). FAQPage schema is one of the strongest signals for AI citation because the structure matches how answer engines extract Q&A pairs.
Read the full Q&A
Article schema marks up editorial content with the author, publisher, datePublished, dateModified and headline. Properly-attributed Article schema is a major AEO signal because answer engines weight authorship and freshness when deciding which sources to cite. Article schema without author and date attribution is functionally worthless for AEO.
Read the full Q&A
Organization schema describes a company or institution with structured properties for name, logo, sameAs (links to official profiles across LinkedIn, Wikipedia, Wikidata, GitHub and X), contactPoint, address, founder and foundingDate. Strong Organization schema is foundational to AI entity recognition.
Read the full Q&A
AI Crawlers & Bots 6 terms
GPTBot is OpenAI's crawler that gathers content used to train future versions of ChatGPT and to retrieve information during browse sessions. Sites that block GPTBot in robots.txt remove themselves from ChatGPT's training and live-browse data. For AEO, GPTBot should be allowed.
Read the full Q&A
ClaudeBot is Anthropic's crawler used to gather content for Claude training and live retrieval. Like GPTBot, blocking it removes the site from Claude's knowledge and citations. For AEO, ClaudeBot should be allowed.
Read the full Q&A
PerplexityBot is Perplexity's crawler that retrieves content cited in Perplexity's answers. Perplexity is an answer-first search engine, so PerplexityBot citation directly translates to traffic. PerplexityBot should always be allowed for AEO.
Read the full Q&A
Google-Extended is the robots.txt user agent that controls whether Google can use a site's content for AI training (Gemini, AI Overviews training). It is distinct from Googlebot, which controls Search indexing. Blocking Google-Extended does not affect organic Search rankings but does exclude the site from AI Overview citation eligibility.
Read the full Q&A
Bytespider is ByteDance's crawler associated with Doubao and other ByteDance AI products. It is one of the most aggressive AI crawlers on the web. For brands targeting Chinese-language AI search visibility, Bytespider should be allowed.
CCBot is Common Crawl's crawler. Common Crawl is the largest publicly-available web corpus and is used as training data by virtually every major LLM including those at OpenAI, Anthropic, Meta and Google. Blocking CCBot removes the site from this foundational training data source.
Read the full Q&A
Agentic Protocols 8 terms
Model Context Protocol is an open standard introduced by Anthropic that lets AI agents (Claude, ChatGPT, others) connect to external data sources and tools through a uniform interface. Sites can publish an MCP Server Card at /.well-known/mcp-server-card declaring an MCP endpoint, which agents can then call to retrieve data or perform actions.
Read the full Q&A
Agent Skills are declared capabilities a website exposes to AI agents through a manifest typically published at /.well-known/agent.json. Each skill describes an action the agent can take (search products, book appointment, request quote) with the parameters and authentication required.
WebMCP is an emerging variant of Model Context Protocol designed for browser-accessible discovery. Sites publish a WebMCP descriptor at /.well-known/webmcp declaring agent-callable capabilities in a way browser-based agents can consume without server-side MCP infrastructure.
x402 is a revival of HTTP status code 402 Payment Required, repurposed for agentic micropayments. Sites declare x402 support at /.well-known/x402 or by returning 402 responses with payment instructions. AI agents that support x402 can pay per API call or per content access, enabling pay-per-use monetization without human checkout flows.
Read the full Q&A
NLWeb (Natural Language Web) is a Microsoft-driven proposed standard for declaring website capabilities to agents in natural-language form. Sites publish an NLWeb descriptor at /.well-known/nlweb describing what the site does, what queries it can answer, and how agents can interact.
Read the full Q&A
Web Bot Auth is an IETF draft standard for cryptographically authenticating bot requests using HTTP Message Signatures. Sites publish a directory at /.well-known/http-message-signatures-directory listing public keys for trusted bots. The standard solves the long-running problem of distinguishing legitimate AI crawlers from impersonators.
Read the full Q&A
DNS for AI Discovery is a proposed standard where sites publish AI-discovery metadata in DNS TXT records at _aid.example.com. The pattern lets agents perform discovery before any HTTP fetch, useful for blocked, redirected or expensive-to-fetch domains.
Read the full Q&A
Content Signals are HTTP headers and meta tags declaring how AI systems may use content. Common signals include Content-Signal: noai (do not use for AI training), noimageai (do not use images for AI training), TDM-Reservation (text and data mining reservation under EU copyright law) and Cloudflare's CF-Content-Signal extension.
Read the full Q&A
Technical SEO 6 terms
Core Web Vitals are Google's page-experience signals: Largest Contentful Paint (LCP, loading speed), Interaction to Next Paint (INP, responsiveness) and Cumulative Layout Shift (CLS, visual stability). They are a Google ranking factor and are increasingly weighted by AI agents that prefer fast-rendering sources.
Time to First Byte measures how long after a request the server starts returning data. Under 800ms is considered good, over 2.5 seconds is poor. TTFB matters for AEO because aggressive AI agents time out on slow servers before they finish parsing the page.
A canonical URL is the master URL for a piece of content when duplicates or variants exist (e.g. with tracking parameters, http vs https, www vs non-www). Sites declare canonicals via a rel="canonical" link tag in the head. Mismatched or missing canonicals cause search engines and AI agents to split signals across duplicate URLs.
hreflang is an HTML link tag attribute that declares language and regional variants of a page. Sites with multiple language or region versions must implement hreflang correctly or search engines and AI agents may serve the wrong-language variant to users.
Internal linking is the practice of linking between pages on the same site to distribute authority, guide crawlers and signal topical relationships. Strong internal linking creates topical clusters that both search engines and AI agents use to understand which pages are authoritative on which topics.
Structured data is machine-readable metadata about a page's content, typically expressed in Schema.org vocabulary via JSON-LD. It tells search engines and AI agents exactly what kind of content a page contains (article, product, organization, FAQ) and the key facts about it. Strong structured data is one of the highest-leverage AEO investments.
Entity & Authority 3 terms
Entity strengthening is the practice of building and reinforcing a brand's presence in the structured knowledge graphs that AI models rely on. Tactics include creating or improving Wikipedia articles, adding the brand to Wikidata, registering on Crunchbase, claiming Knowledge Panel data, and ensuring sameAs links in Organization schema point to all official channels.
Topical authority is the degree to which a site is considered an expert source on a specific topic, as measured by both search engines and AI agents. It is built through depth of coverage (many pages on the topic), entity associations (Wikipedia, Wikidata), citation patterns (links from other authoritative sources) and content quality signals.
E-E-A-T stands for Experience, Expertise, Authoritativeness and Trust. It is the framework Google's quality raters use to evaluate content quality, particularly for Your Money or Your Life (YMYL) topics. AI agents have begun applying similar signals: author bios, credentials, citations to primary sources and clear attribution all improve E-E-A-T.

Want Spacemen Digital to make your brand the citation source?

This glossary is one piece of how we win citations across ChatGPT, Claude and Perplexity. Same playbook works for your brand.