Five AI Engines, Five Content Diets: A Q2 2026 Citation-Type Breakdown

Across 1,430 structurally classified citations from five production AI search engines, ChatGPT cites the vendor's own first-party site 68% of the time. The other four engines run 46 to 52%. Only Perplexity cites video meaningfully (9.7%, almost entirely YouTube). Community discussion (Reddit, Quora) appears in zero ChatGPT, Claude, or Perplexity citations, and only 1 to 2% of Gemini and Google AI Overview citations. Each engine is running on a different slice of the web.

Last updated 2026-07-05Data window 2026-05-18 (375 buyer-intent responses across 75 prompts × 5 engines; 2,583 citations, 1,430 structurally classified)

Methodology

Each citation in the Q2 2026 Citation Benchmark dataset (375 buyer-intent responses across 75 prompts × 5 engines, 2,583 cited URLs total) was tagged with one of fifteen structural domain categories defined in state/research/domain-taxonomy.json. The taxonomy spans vendor first-party (the brand's own canonical site), niche publisher hub (Healthline, Sleep Foundation, OutdoorGearLab), listicle / content-farm (emailvendorselection.com, insiderone.com), business press, tech press, lifestyle media, institutional (.gov, .edu, nonprofit .org), video (YouTube), community UGC (Reddit, Quora, HN), review aggregator (G2, Capterra), marketplace (Amazon, Chewy), personal blog / Medium, developer platform (GitHub, npm), and search-engine cache URL. Hand-classification covered the top-volume domains; long-tail citations are left unclassified and reported separately. We report 1,430 classified citations (55.4% of the total). All per-engine percentages use the engine's classified subset as denominator and the underlying counts are reported for transparency. The aggregation script publishes alongside this report.

Five engines, five content diets

Days 1 through 5 of this research series measured the cross-engine citation graph at the domain level and the vertical level. The Q2 2026 dataset has another lens that has not been pulled yet: what kind of content each engine reaches for. Every cited URL in the benchmark was tagged with one of fifteen structural categories. The patterns by engine turn out to be striking.

ChatGPT cites the vendor's own first-party site 68% of the time. The other four engines run between 46% and 52%. ChatGPT also cites zero video, zero community discussion, zero personal blogs, and zero review aggregators. Claude cites zero video, zero community discussion, and zero tech press. Perplexity is the only engine in the five that cites YouTube at scale, and Gemini is the only engine that cites Reddit at any meaningful rate. The engines are running on visibly different slices of the web even when they're answering the same buyer-intent question.

Population (n = 5 engines × 75 buyer-intent prompts)

375

Responses sampled

2583

Citations extracted

1430 (55.4%)

Citations classified

Categories in taxonomy

Verticals covered

Q2 2026 benchmark

Source dataset

The per-engine summary table

Each engine returned 75 responses, but the count of citations per response varies by a factor of nearly three. Google AI Overview leads at 9.3 citations per response. Gemini ships 9.3. Perplexity 7.3. Claude 5.3. ChatGPT 3.3. The classified subset within each engine is the right denominator for any structural comparison.

Engine	Citations / response	Classified	Classified %	Top category
ChatGPT	3.28	150 / 246	61.0%	Vendor first-party (68.0%)
Claude	5.33	205 / 400	51.3%	Vendor first-party (52.2%)
Gemini	9.25	370 / 694	53.3%	Vendor first-party (45.7%)
Google AI Overview	9.31	385 / 698	55.2%	Vendor first-party (48.8%)
Perplexity	7.27	320 / 545	58.7%	Vendor first-party (47.8%)

The full per-engine, per-category matrix

Each cell is the share of the engine's classified citations going to that category. The categories are ordered by total classified citation volume across all five engines. Cells where the engine cites zero of a category are shown as 0%, which is what they are. There are eleven such zeros in the matrix, all five clustered around ChatGPT and Claude.

Content type	ChatGPT	Claude	Gemini	AIO	Perplexity
Vendor first-party	68.0%	52.2%	45.7%	48.8%	47.8%
Niche publisher hub	11.3%	23.9%	19.7%	18.4%	21.6%
Listicle / content-farm	2.0%	11.2%	9.5%	10.1%	10.0%
Business press	2.7%	3.9%	5.4%	5.2%	1.6%
Video (mostly YouTube)	0.0%	0.0%	3.0%	2.6%	9.7%
Institutional (.gov / .edu / nonprofit)	6.7%	3.4%	1.9%	1.8%	3.1%
Personal blog / Medium	0.0%	1.5%	4.1%	3.1%	1.3%
Tech press	4.7%	0.0%	2.7%	3.6%	0.3%
Marketplace	2.7%	2.4%	1.9%	1.3%	2.2%
Review aggregator	0.0%	0.5%	1.6%	2.1%	1.3%
Community discussion (Reddit, Quora, HN)	0.0%	0.0%	2.4%	1.0%	0.0%
Lifestyle media	1.3%	0.0%	1.4%	1.0%	0.3%
Developer platform	0.7%	0.0%	0.5%	0.5%	0.3%
Search-engine cache URL	0.0%	1.0%	0.3%	0.3%	0.6%

Coverage: how broad each engine's citation diet is

The fourteen classified categories are not equally represented across engines. Gemini and Google AI Overview cite from all fourteen. Perplexity covers thirteen (the only gap is community discussion). ChatGPT and Claude each cover nine of the fourteen. The five categories ChatGPT skips entirely are video, personal blog, review aggregator, community discussion, and search-engine cache URLs. The five Claude skips are video, tech press, community discussion, lifestyle media, and developer platform.

Engine	Categories cited	Categories absent
ChatGPT	9 / 14	Video (mostly YouTube), Personal blog / Medium, Review aggregator, Community discussion (Reddit, Quora, HN), Search-engine cache URL
Claude	9 / 14	Video (mostly YouTube), Tech press, Community discussion (Reddit, Quora, HN), Lifestyle media, Developer platform
Gemini	14 / 14	(all covered)
Google AI Overview	14 / 14	(all covered)
Perplexity	13 / 14	Community discussion (Reddit, Quora, HN)

Five engine signatures

Run the per-engine columns through the eye and a personality emerges for each one. Below is the simplest version of that read.

ChatGPT

The vendor-first-party engine. 68% of its classified citations are the brand's own .com. Tiny appetite for community discussion, video, or personal blogs.

Top vendor first-party citations

hubspot.com (3)
semrush.com (2)
salesforce.com (2)
zoho.com (2)
atlassian.com (2)

Claude

The niche-authority engine. 52% vendor first-party, 24% niche publisher hub. Cites Healthline, BetterTrail, Sleep Doctor, The Good Trade. Zero video, zero community.

Top niche publisher hubs

healthline.com (6)
bettertrail.com (3)
sleepdoctor.com (3)
thegoodtrade.com (3)
upgradedpoints.com (2)

Gemini

The omnivore. Cites all fourteen categories. Accounts for 69% of all community-UGC citations in the dataset. Closest engine to a general web view.

Top community UGC citations

reddit.com (8)
quora.com (1)

Google AI Overview

Also covers all fourteen categories, with the highest video and tech-press share among the four web-grounded engines. Cites 9.3 sources per response.

Top tech press citations

pcmag.com (5)
cnet.com (5)
techradar.com (2)
zdnet.com (1)
tomsguide.com (1)

Perplexity

The video-grounded engine. 9.7% of its classified citations are video, almost all YouTube. Mid-pack on everything else.

The vertical overlay: SaaS is vendor-first, CPG is publisher-first

Collapse the 25 verticals into three super-categories: tech SaaS, consumer services, and CPG / retail. The engine-level patterns above are real, but they are partly an artifact of which verticals each prompt set covered. Looking at the vertical lens directly, the gap between SaaS and CPG / retail is much larger than the gap between any two engines.

Content type	Tech SaaS	Consumer services	CPG / retail
Vendor first-party	76.0%	58.7%	9.4%
Niche publisher hub	0.6%	15.7%	48.5%
Listicle / content-farm	12.4%	2.1%	8.2%
Business press	1.0%	3.7%	8.4%
Video (mostly YouTube)	2.0%	3.3%	6.1%
Institutional (.gov / .edu / nonprofit)	0.0%	5.0%	5.9%
Personal blog / Medium	1.7%	2.1%	3.5%
Tech press	1.9%	4.1%	1.8%
Marketplace	0.3%	0.4%	5.1%
Review aggregator	2.6%	0.4%	0.0%
Community discussion (Reddit, Quora, HN)	0.7%	1.7%	0.8%
Lifestyle media	0.0%	1.2%	1.8%
Developer platform	0.9%	0.0%	0.0%
Search-engine cache URL	0.0%	1.7%	0.4%

Tech SaaS responses cite the vendor's own first-party 76% of the time and a niche publisher hub 0.6% of the time. CPG / retail responses cite the vendor first-party 9.4% of the time and a niche publisher hub 48.5% of the time. The buyer asks "best CRM software" and gets salesforce.com. The buyer asks "best mattress" and gets sleepfoundation.org. Two different AI-search products, depending on the vertical.

The single engine-exclusive category

The taxonomy includes one category that is effectively a one-engine signal in Q2 2026:

Community discussion (Reddit, Quora, HN): Gemini accounts for 69.2% of all 13 citations to this category in the dataset.

Community discussion is a Gemini-shaped signal in this dataset. Google AI Overview picks up some of it (4 of 13 community citations). ChatGPT, Claude, and Perplexity pick up none of it. A brand or publisher investing in Reddit visibility specifically as an AI-citation lever should be honest with themselves that the payoff is concentrated in two of the five major engines and zero in the other three. The total citation volume to community UGC across the entire dataset was only 13 URLs (0.9% of classified citations).

What this means for a publisher

Three implications for an operator who cares about AI-search visibility across more than one engine.

One. Optimize for vendor first-party first. Every engine puts the brand's own .com in the top three content types (ChatGPT 68%, Claude 52%, Perplexity 48%, Google AI Overview 49%, Gemini 46%). The single highest-leverage investment is making the product pages, resource centers, glossary entries, and FAQs deep enough that an answer engine wants to lift from them. This dovetails with the AI Readiness study finding that 29.6% of scanned sites ship no structured data at all.

Two. Earn niche-publisher-hub citations for consumer-facing categories. If the product sits in CPG, retail, or health, the citation lever shifts from first-party to a category authority publisher: Sleep Foundation for mattresses, Healthline for health, OutdoorGearLab for outdoor gear, The Good Trade for sustainable consumer. Across all engines combined, these niche publisher hubs are the second-largest content type after vendor first-party.

Three. Treat video and community discussion as engine-specific bets. YouTube is a Perplexity bet, with a weak echo on Gemini and Google AI Overview, and nothing on ChatGPT or Claude. Reddit / Quora / HN is a Gemini bet with a weak echo on AI Overview, and nothing on ChatGPT, Claude, or Perplexity. Invest in these channels only if the engines that cite them are part of the target visibility set.

How to use this source-layer map

The dataset is useful because it separates the owned-page work from the outside-source work. If an engine mostly cites first-party vendor pages, the fix starts on your own product, docs, FAQ, comparison, and research pages. If an engine leans on publisher hubs, community threads, reviews, or video, the fix moves into the source layer that engine already trusts.

Use the table above to pick the channel by engine, then use the playbooks below to turn that channel into a concrete action queue.

Digital PR

Engine fit: Strongest when engines cite business press, tech press, or niche publisher hubs.

Action: Package original data with sample size, methodology, and a short quote journalists can cite.

Topical authority

Engine fit: Strongest for ChatGPT and other first-party-heavy answers.

Action: Build a complete owned cluster with glossary, comparison, FAQ, product, and research pages.

Thought leadership

Engine fit: Strongest when the answer needs a named expert or repeatable category claim.

Action: Turn one defensible claim into first-party copy, guest commentary, podcast notes, and citations.

Community forums

Engine fit: Strongest for Gemini and Google AI Overview in this classified sample.

Action: Answer real buyer questions in public communities, then reinforce the same language on owned pages.

Online reviews

Engine fit: Strongest where engines cite review aggregators, Reddit, category forums, and Google-indexed profiles.

Action: Keep review profiles accurate, structured, and aligned with the factual claims on your site.

Perplexity video

Engine fit: Strongest for Perplexity, which cited YouTube meaningfully in the Q2 sample.

Action: Publish a crawlable walkthrough with transcript, matching page links, and clear product/entity language.

Other reports in this series

This is Day 7 of an ongoing research cadence. Days 1 through 5 draw from the Q2 2026 AI Search Citation Benchmark and measure the engine side of citation. Day 6 measures the publisher side: how technically prepared the typical scanned domain is to be cited. This Day 7 report slices the engine side by content type, the lens Day 2 collapsed away into binary aggregator-vs-vendor.

AI engines agree on brands more than sources. A post-fix production-data study of answer agreement vs. source overlap.
AI Readiness across 311 websites. Median AI Readiness 46 vs median SEO 86. The 40-point publisher-side gap.
Top 100 most-cited domains in AI search (Q2 2026). Only 12 of 1,119 domains are cited by all five engines.
ChatGPT vs. Google AI Overview: the same prompt, two different webs. 4.1% Jaccard, 64% zero-overlap.
Buyer intent reshapes AI citations. Discovery, shortlist, and variation cite different domain sets.
When AI engines cite the reviewer vs. the brand. The 70-point vendor-vs-aggregator gap by vertical.
AI Search Citation Benchmark, Q2 2026. The underlying dataset.

To turn the content-type mix into a monitoring plan, compare tool coverage in the AI search visibility comparison hub and track engine-specific prompt behavior with Foglift AI search monitoring.

Frequently Asked Questions

Why is the unclassified portion 44.6% of all citations, and is that a problem?

The 15-category taxonomy was applied to the high-volume domains that account for the structurally meaningful share of citations. Long-tail domains, each cited once or twice across the dataset, were left unclassified to keep the classification accurate rather than guessed. Every per-engine percentage in this report uses the classified subset as the denominator, so the unclassified portion is not double-counted or hidden. The 1,430 classified citations are still the largest hand-classified AI-citation sample we are aware of for Q2 2026, and the patterns visible in the classified subset are large enough that even pessimistic assumptions about the unclassified long tail would not flip the comparative engine rankings.

ChatGPT cites the vendor first-party 68% of the time. Does that mean ChatGPT is the easiest engine to win on?

Easier to influence, harder to monitor. Vendor first-party concentration means that the brand's own .com pages have the most direct lever on ChatGPT visibility, so investments in first-party content quality (product pages, resource centers, glossary, FAQs) translate the most cleanly into ChatGPT citations. But it also means a smaller surface area for diversification: when a brand has no organic citations from independent publishers, AI-search visibility becomes brittle. The strongest playbook on ChatGPT is to ship first-party content that is dense enough that an answer-engine wants to lift from it, while in parallel earning citations on the niche publisher hubs and institutional sources that other engines reach for.

Why does Perplexity cite YouTube and the other engines don't?

Perplexity is the only one of the five engines that surfaces YouTube as an inline source consistently. ChatGPT cites zero video. Claude cites zero video. Gemini and Google AI Overview cite YouTube but at very low rates (1.6% and 1.4%). The mechanism is that Perplexity's grounding model treats YouTube video transcripts as first-class web content and is willing to ship the YouTube URL in the citation panel, whereas the other engines either don't index transcripts heavily or don't surface video links as canonical citations. The practical implication for a publisher is that YouTube content has a meaningful AI-search payoff only if Perplexity is part of the target engine set, so the citation lift from YouTube is engine-specific.

Community discussion (Reddit, Quora, HN) is 0% on Claude and Perplexity. What does that say about the user-discussion-as-AI-signal thesis?

The popular thesis is that getting your brand discussed on Reddit translates into AI search visibility because models train on Reddit and engines cite Reddit. In this Q2 2026 sample, the effect appears in Gemini and very weakly in Google AI Overview. ChatGPT, Claude, and Perplexity produced zero community-UGC citations. Out of 13 community-UGC citations in the entire classified set, 9 came from Gemini, 4 from Google AI Overview, and 0 from the other three engines. Reddit discussion is a real lever inside a narrow subset of the engine universe, but framing it as a general AI-search lever overstates how broadly the signal propagates.

How does this connect to the rest of the Q2 2026 benchmark series?

Day 1 measured cross-engine fragmentation by domain. Day 2 collapsed the 15 categories into a binary aggregator-vs-vendor lens and broke it down by vertical. Day 3 broke citations down by buyer intent. Day 4 paired ChatGPT and Google AI Overview specifically. Day 5 published the top-100 most-cited domains. Day 6 pivoted to the publisher side (AI Readiness across 311 scanned sites). This Day 7 report goes back to the engine side and reads the 15 categories per engine, which is the lens Day 2 collapsed away. The five engines turn out to have markedly different citation diets, which is the engine-level counterpart to Day 2's vertical-level finding.