AI Engines Agree on Brands More Than Sources

A post-fix June 2026 study of 1,373 Foglift monitoring results found that five AI engines usually agree on whether a brand belongs in the answer, yet their cited source domains diverge sharply. Gemini and Google AI Overview are the exception, with 98.4% brand-mention agreement and 0.643 citation-domain overlap.

Last updated 2026-06-20Data window 2026-06-16 to 2026-06-20 UTC (1,373 production monitoring rows; 533 latest prompt-engine rows; 62 complete five-engine prompt sets)Download CSV

Methodology

We pulled anonymized aggregate data from production geo_results rows created from 2026-06-16 through 2026-06-20 UTC, after the Perplexity mention false-negative fix and the ChatGPT / Claude citation extraction backfill. The analysis keeps the latest row per workspace, prompt text, and engine, then measures the 62 workspace-prompt groups where all five engines produced a result. Brand-mention agreement is the share of paired rows where both engines made the same yes/no decision. Citation-domain overlap is Jaccard similarity over normalized cited root domains. No workspace, customer, prompt, or response text is published.

The finding

The cleanest answer is: both layers matter. The engines often converge on the brand decision, while the cited source layer stays fragmented. In the 62 prompt sets where ChatGPT, Claude, Gemini, Google AI Overview, and Perplexity all produced a result, most engine pairs agreed on the yes/no brand-mention decision more than 90% of the time.

Their source sets told a different story. Gemini and Google AI Overview shared a large source layer, with 0.643 average citation-domain Jaccard overlap. Every other pair was much lower. Gemini and Perplexity reached only 0.197. Google AI Overview and Perplexity reached 0.188. ChatGPT and Claude reached 0.027.

Production rows

1,373

Complete prompt sets

Closest source pair

0.643

Lowest source pair

0.027

Engine-level source density

The five engines had similar brand-mention rates in the post-fix window, ranging from 52.7% to 56.5%. The larger difference was how much source material each answer exposed. Gemini, Google AI Overview, and Perplexity averaged roughly 10 to 11 citations per answer. ChatGPT averaged 3.28. Claude averaged 1.34 and exposed citations in 41.2% of rows.

Engine	Rows	Mention rate	Avg citations	Citation coverage
ChatGPT	246	56.5%	3.28	96.3%
Claude	245	54.7%	1.34	41.2%
Gemini	244	56.1%	11.15	100.0%
Google AI Overview	406	52.7%	10.43	99.5%
Perplexity	232	53.9%	11.32	100.0%

Pairwise agreement vs. source overlap

The strongest split appears when the answer layer and source layer are measured side by side. Gemini and Google AI Overview look like siblings: high brand-mention agreement and high source overlap. Perplexity agrees with the Google engines on the brand decision but cites a different source pool. ChatGPT and Claude converge less often on the brand decision and share almost no cited domains.

Engine pair	Brand-mention agreement	Avg citation-domain Jaccard
Gemini ↔ Google AI Overview	98.4%	0.643
Gemini ↔ Perplexity	96.8%	0.197
Google AI Overview ↔ Perplexity	95.2%	0.188
ChatGPT ↔ Gemini	93.5%	0.084
Claude ↔ Gemini	91.9%	0.062
Claude ↔ Perplexity	91.9%	0.042
ChatGPT ↔ Google AI Overview	91.9%	0.082
ChatGPT ↔ Perplexity	90.3%	0.054
Claude ↔ Google AI Overview	90.3%	0.062
ChatGPT ↔ Claude	85.5%	0.027

Interpretation

This points to a two-step model of AI search visibility. First, the engine builds a candidate set from its retrieved or internally available evidence. Second, it synthesizes a brand answer from that evidence. The candidate-source layer varies sharply by engine. The synthesis layer can still land on the same brand decision.

The Google pair is the clearest evidence for a shared retrieval substrate. Gemini and Google AI Overview agree on the brand decision in 98.4% of complete prompt sets and share far more cited domains than any other pair. That does not mean the answers are identical. It means the source universe feeding the answers is visibly related.

Perplexity is the opposite pattern. It often agrees with Gemini and Google AI Overview on whether the brand belongs in the answer, yet its citation overlap with them is much lower. That is a retrieval difference with answer-level convergence.

Capture-validity gate

The first pass on this study failed because recent ChatGPT and Claude rows had response-text sources that were absent from the stored citations array. Builder fixed the extraction path and backfilled recent rows before this report was published. We then reran the gate on the latest 20 ChatGPT rows and latest 20 Claude rows.

Engine	Sample rows	Avg stored citations	Zero-citation rows	Zero rows with text URLs
ChatGPT	20	2.95	0	0
Claude	20	1.75	12	0

What to do with it

A blended AI Visibility score is useful for the board-level trend. It is too blunt for source acquisition. Source strategy has to be engine-specific.

For Gemini and Google AI Overview, treat improvements as partially shared. Content that becomes a strong source for one has a realistic chance of helping the other.
For Perplexity, audit the citation panel directly. The brand answer can match the Google engines while the source set comes from a different publisher universe.
For ChatGPT and Claude, track both mention status and cited-source density. A brand can be named with few surfaced sources, which makes source attribution a separate optimization problem.

Limits

This is production monitoring data, not a controlled prompt benchmark. Prompts span multiple workspaces and industries, and the post-fix window covers four days. We publish only anonymized aggregates. The sample is strong enough to show the source-vs-answer split, but it should be refreshed quarterly with a fixed prompt set before being used as a category-wide law.

Citation-domain Jaccard measures surfaced citations only. It does not claim to observe every page an engine retrieved internally. That makes the metric conservative: it measures the source layer available to users and downstream citation analysis.

Reproducibility

The aggregation script is saved under state/research/engine-source-divergence-2026-aggregation.mjs. It reads production rows, keeps only anonymized aggregate metrics, and emits the same tables used here. The downloadable CSV contains the engine summary, pairwise overlap table, and capture-validity gate.

Earlier fixed-prompt reference benchmark: AI Search Citation Benchmark, Q2 2026
Companion source-overlap analysis: ChatGPT vs. Google AI Overview
Engine content-type breakdown: Five AI Engines, Five Content Diets

To measure this on your own brand, use the AI search monitoring workflow and compare brand mentions, citations, and competitors separately for ChatGPT, Perplexity, Gemini, Claude, and Google AI Overview.

Frequently Asked Questions

Do AI engines surface different brands because they index different sources?

Partly. In this June 2026 sample, brand-mention decisions were fairly aligned across engines, but cited-domain overlap was much lower. That means source retrieval differs strongly, while the final brand decision can still converge.

Which engines behaved most similarly?

Gemini and Google AI Overview were the closest pair, with 98.4% brand-mention agreement and 0.643 average citation-domain Jaccard overlap across 62 complete prompt sets.

Which engines were most different?

ChatGPT and Claude were the weakest pair by both answer agreement and source overlap in this sample: 85.5% brand-mention agreement and 0.027 average citation-domain Jaccard overlap.

What should marketers do with this finding?

Track engines separately. A page or source that helps in Gemini and Google AI Overview may have little effect in ChatGPT, Claude, or Perplexity, even when all five engines answer the same buyer prompt with similar brand decisions.

How did Foglift validate ChatGPT and Claude citations before publishing this?

Foglift sampled the latest 20 ChatGPT rows and latest 20 Claude rows after the extraction backfill. Zero stored-citation rows no longer contained URLs or bare domains in response text, so the remaining low-citation Claude rows are treated as surfaced-answer behavior rather than an extraction miss in this sample.

← Foglift Research