Sub-document processing and what it means for content

Traditional search and GPT-based search look at a whole page, score it, and summarize it, whole-document indexing. Now LLMs are moving away from this practise and towards sub-document processing.

Indexing is moving from whole pages, to snippets

Instead of retrieving the "Top 10" websites using a search engine, Perplexity now retrieves about 26,000 granular snippets (fragments of 2–4 words/tokens) from across the web to fill the AI’s "context window." By saturating this window with facts, the AI has no room to hallucinate, resulting in a more accurate, synthesized answer.

This means the AI doesn't just "see" your article; it extracts thousands of tiny numerical representations of your facts.

To be eligible, your content needs to be highly modular and factual. If a single paragraph contains three distinct, valuable facts, it has a higher chance of being "plucked" into that 130,000-token context window than a long, rambling narrative.

Why is this change happening?

The goal of this is to fill the AI’s memory (context window) with relevant snippets. When the window is "full" of data, the AI stops being "creative" and starts being a "distiller." To be the source the AI chooses, your data must be highly relevant and dense. The AI isn't looking for the "best page"; it’s looking for the "best 26,000 fragments" to solve a puzzle.

The End of the "Zero-Sum" Game

In traditional SEO, there is one #1 spot. In AEO, two people asking the same question will get different answers because the AI incorporates personal memory (the user’s previous interests, location, and history) into the context window.

How you structure your content must change

You can't just "rank for a keyword" anymore. Visibility is now even more context-dependent. Your brand needs to be present in the specific niche or "neighborhood" of information that your target user frequently interacts with. If you're selling running shoes, blogs of independent runners are a good platform to take advantage of.

This means that your content must be highly modular and factual; you need to bring something to the table. This will discard any AI summaries of already known information. Your company needs to bring something new to the table. Show expertise and experience through factual data and smaller, more knowledgeable pieces.

PageRank Still Matters for "Eligibility"

While the answer is generated from snippets, Perplexity still uses a form of PageRank (link-based authority) to decide which documents are even "eligible" to be broken down into snippets. Classic SEO (backlinks and authority) is the "barrier to entry." Once you're through the door, "Sub-Document Optimization" (clarity and fact-density) determines if you actually get cited in the final answer.

If you want to "win" in this new environment, stop writing for "keywords" and start writing for "extractable facts." Use clear, concise language that an AI can easily tokenize and fit into its limited context window.

Sources

Search engine journal

‍

Creating content for AI answer engines

Level 3: Give the LLM ready to use data and content

Use schema markup, justification-ready statements, and layered intent coverage so AI can extract facts and defend recommendations.

Read

Level 2: Help the model pick you over the others

Create structured comparisons, match search intent, and use FAQs to give answer engines clear reasons to prefer your content.

Read

Level 1: Give the model a clear map

Master AEO foundations: proper headings, front-loaded summaries, and declarative sentences help AI extract and cite your content accurately.

Read

Publish (on your) Own Site, Syndicate Elsewhere

Discover how to own your content, reduce third-party dependence, and ensure a single source of truth for AI answer engines through publishing on your own site.

Read

Schema Markup

Learn schema markup (JSON-LD) to help search engines and AI understand your site, extract vital product information, and qualify for Google search features.

Read

Author

Bernt Roalkvam

Bernt has over 15+ years of experience in web development and marketing, building and maintaining digital marketing channels for 50+ brands. He started experimenting with optimizing content for AI Answer Engines in 2024 with the brands he already managed. All this experience is now built into Soprano, to help marketers succeed in the era of AI answer engines.

Did you find what you where looking for?