AI Measurement & Prompt Fanning WTF: A C-Suite Survival Guide, Part 1

Posted In AI (Artificial Intelligence)
Jun 25, 2026

What to Ask LLMs, and Why (a.k.a. Prompt Fanning)

Begrudgingly, savvy marketers already accept that a double-digit slice of their data in 2026 is noise. Before you optimize anything, answer the harder question: are we measuring the right question space? How can we sift through a tornado of new activity and search behaviors? Welcome to prompt fanning, a crucial new audit surface.

TL; DR

Defining the question space is the priority. Prompts we fire at LLM models govern downstream eval, so input deserves serious scrutiny.
Build the testing space deliberately and fire across models at scale.
Be creative, brand and human-centric at core, remember that the concepts behind prompt clusters likely matter as much as any specific prompt semantics (we may never fully know). From Google’s “People Also Ask” mining and first-party customer support data, shopping platform reviews, and G2 to stemmed Reddit conversations and search-based question generators. Get jiggy with real customers.
Provenance is the single most important test of prompt-fanning programs, built in-house, bought from a vendor, or (better) a hybrid API/MCP/house dev mashup, every prompt originated and justified in relation to the first-party brand.
Off-the-shelf tools do carry value, but covering one brand’s full reality takes augmenting any set client-by-client. Some level of custom building is called for in nearly ever case.
This article, Part-1, shares some methods we use to fan concepts and keywords into prompts. Part-2 will teach how we value semantics and mechanics of presence across Google AIO, LLMs, and Google AIO as a prompted, LLM surface. Being mentioned vs. cited are different currencies, and you’ll learn why any single LLM “rank” metric is a lie. Part-3 will share science-backed optimization tactics grounded in data, drawing a hard line between durable authority vs. spike-then-crash manipulation platforms eventually punish. You’ll learn why Prompt-based visibility is similar to polling; you’re seeking to report distributions with confidence. “Maturity” means acting on a noisy instrument with your eyes wide open.

And for a quick level-set before you read on, think of prompt fanning as stress-testing how AI perceives your brand. Instead of just tossing ChatGPT or Claude a handful of random questions and hoping for the best, you explode a single keyword or concept into thousands of hyper-specific, natural-language variations.

It’s essentially mapping out the entire “question space” of how real people ask things, then firing that massive cluster at AI engines to see exactly where your brand shows up, where it gets buried, and how the models actually understand you at scale.

With that, let’s dig in.

Last month I had the pleasure of keynoting the Marketing Analytics Summit in Santa Barbara, and one belief was seemingly common among attendees from junior analysts up to CMOs. Some double-digit percentage of every dataset is obviously noise. We’re talking bots, garbage traffic, misfires, a plethora of usual distractions, and (literally) nobody in the room argued the junk-data figure down towards zero. Most practitioners at Jim Sterne’s storied conference have made peace with a yardstick reading closer to 27.4 inches than a true 36, and a mature discipline if not calm acceptance.

Welcome to AI. Data sucks xx% of the time. Culling signal from the muck and mining good data for action is the work, same as it ever was. AI has only made the muck deeper. Few areas surface the true clusterfark as much as testing LLM for brand visibility.

In AIO (Google AI Overviews), LLM/AEO (Answer engine optimization), the appropriate first object is the “question space,” a set of prompts we fire at an array of AI models to watch how a brand shows up. Measure the wrong space with perfect precision, and the precision buys you next to nothing.

A whole generation of AIO, AEO/LLM visibility tools has now shipped, and each one sells its own metaphor for how question space works. Investor money is pouring in, digging a hole that cash will only ever partly fill. The underlying appetite among C-suite executives, investors, and boards is for certainty, and these new machines reduce to something the field can mostly understand and partly explain, with the rest remaining genuinely opaque. This generation of AI viability tools is useful, but the catch is simple: every brand is unique, some brands wildly so, and off-the-shelf power tools for average players cover common ground and stop at edges. Covering every scenario takes one path, with marketers building out stock tools for each client one at a time.

The Chief Cluster Officer (CCO)

We’ve noted a common scene playing out in multiple boardrooms, surprisingly predictable. An employee, self-motivated to learn mid-level prompt engineering, builds a few appliances, earns a promotion to lead AI for the company, but their technical depth has yet to catch up to the title. They open their paid ChatGPT, Claude Opus, and maybe a few others, type in five or six, maybe dozens of prompts, watch the brand surface (or not), and feel bullish, because surely the premium AI model said so. These emerging prosumers often measure reality with a small sample of prompts grounded in their personal theory and call it a study. The AI tools are great at confirming a bias.

Advanced professionals who are genuinely good at the work spent years before AI bubbled to the mainstream, deep in the research. They became legit developers along the way. That’s because studying how brands surface across the models, at scale and in the minutiae, demands engineering chops that are difficult to spec for your IT department.

Real researchers know the key input required: what do we ask LLM models, and then how do we choose prompts? Get the question-space wrong and every number downstream is drama camp.

First nail the question-space, THEN pound a multi-model array repeatedly at scale, and study the results using repeatable, programmatic methods. We’re talking thousands or tens of thousands of prompts fired at free, flagship, and emergent models, measured multiple times for the brand’s presence and gauging the usefulness of the answers compared to competitors. That, friends, is a very high bar that takes a hybrid marketing engineer, a fluent expert in SEO, AIO, AEO, and APIs, MCP, and dev (as well as being a clever, creative bird).

A buyer asks once, the engine asks many times

Your prompt reaches an AI engine in the form of many questions at once. Michael King of iPullRank lays out the architecture plainly: every major AI search platform now runs agentic retrieval, so a single user query fans out into roughly 5 to 20 internal sub-retrievals before the answer is composed. Your brand gets judged across the entire fanned space, on all of it at once. King also warns that raw citation counts underreport the real footprint by three to ten times, which means a careless reading can flatter or bury brands at random. It’s the line between rigor and lame SEO wish-parties, the same line brands’ budgets should draw around bloated tools: vendors selling scale and screenshots while hawking their one-size-fits-all provenance question entirely.

50 (of many) ways to build the question-space

Where prompts stem from determines their quality, so measure sources from definitive first-party signals down and weight the program to match. Here are a handful of methods we use to mine prompts, out of many. Power tip: Use AI to frame non-questions into prompts- the Reddit rant that becomes multiple power prompts, reviews that contain questions. Skim slop off the top and cluster prompts by topical themes. Measure LLM output for thematic clusters from first-party data and third-party feedback about the brand.

People Also Ask is already exposed in search engines. Harvesting the real People Also Ask questions for a topic provides queries arriving on topic, grounded in observed demand, and phrased the way users ask, at little cost.
FAQ fanning. Fanning a brand’s own FAQ library together with the category’s common FAQs yields the literal questions buyers ask, already in buyer language, so intent stays clean, and assembly is fast. The exercise also exposes any gaps between questions the brand answers well on its own site and prompts an AI to answer while ignoring the brand entirely. Brand FAQs tend to be self-serving, so balance the set with competitor-framed and category questions; otherwise, it becomes a flattering mirror.
First-party customer support data mining. The richest fanning pulls from a company’s own records: support tickets, service emails, presales conversations, and chat logs that together transcribe how real customers phrase what they want. Mine the language, cluster it, and turn the clusters into prompts. Clusters tend to form around the values buyers care about, easy assembly, green credentials, child safety, scalability, firmware extending to new hardware generations. Each value cluster becomes a whole family of prompts. Authenticity and intent run highest here, and the cost is access and careful handling, since sensitive data demands real engineering to mine at scale.
Community buyer-voice and UGC.
Beyond a company’s own logs, the open web carries the questions people ask each other instead of a search box.

Pull the phrasing from communities where the category gets argued over, and from user-generated content where buyers compare, complain, rant, and recommend. The voice runs reasonably authentic here, even among spam reviews, thick with product-love, objections and the skeptical “does it actually work” often in the research phase. These are increasingly cited by LLMs. Each source has its strengths and idiosyncrasies, but be wary of stereotyping what you think may be useful in any channel…or not.

5. Analyst pre-fanned questions. A human who knows the buyer writes the questions the funnel can seed discovery, per-seat economics, integrations, “is it worth it,” and direct head-to-head comparisons. Expert-inferred fanning templates built from millions of LLM responses can sometimes create intent- and funnel-stage clusters, humanized by construction. Alone, do not rely on this method because it carries the analyst’s blind spots, and real prompt fanning must stay grounded to real human demand.

6. Clickstream and search-based generators. Wide nets can be mined using AI filters of human activity, leaning on billions of clickstream records. These methods deliver scale and act as a completeness backstop, surfacing long-tail phrasings that human methods can miss. However, also drift off-category quickly and spit out robotic strings, which earns them last place and heavy curation before use.

There is a caveat. There is no public, verified data that tracks how many daily prompts are brand new across LLM platforms like ChatGPT, which fields about 2.5 billion requests per day. However, looking at the closest equivalent, traditional search engines, Google famously estimates that roughly 15% of queries it receives every single day are entirely new. Keeping that in mind, 1.5 billion clickstream records starts to look like a useful-but-incomplete signal, designed for an investor deck.

Lily Ray of Algorythmic and Amsive shared a clean version of the synthetic move: take keywords already carrying real search volume and turn each into the natural-language question a person would speak, preserving the intent type, at scale. Her caveat anchors the category, since every prompt-selection method has limits, which is exactly why the sources must work together. Also, keep in mind the forever-paradigm: millions of searches that were never tendered before today now occur every day.

7. Closing the loop. LLM answers themselves become a source. AI responses name entities, comparisons, and angles outside your original seed set, so read them back out and fan the next round from them. The method self-corrects and surfaces competitors you had no reason to track. Its ceiling is the first round’s coverage, so it sharpens an existing set and works best once the other sources have done their job.

8. Here are a few more (of many sources) for building your question-space algorithm: Google Maps Contributor Reviews, Google Maps Reviews, Google Hotels Reviews, Apple App Store Reviews, OpenTable Reviews, Tripadvisor Reviews, Walmart Reviews, The Home Depot Reviews, Yelp Reviews, Google Locals, Google Local Services, Google Maps, Google Maps Photos, Google Maps Posts, Google Hotels, Google Hotels Photos, Amazon Products, Walmart Products, The Home Depot Products, eBay Products, Google Shopping, Google Immersive Products, Google Play Products, Google Play Stores, Google Play Books, Google Play Games, Google Play Movies, Apple App Store Products, Tripadvisor Places, Yelp Places, OpenTable, Facebook Profiles, Instagram Profiles, YouTube Searches, YouTube Videos, YouTube Video Transcripts, Google Forums, Google Searches, Bing Searches, DuckDuckGo Searches, Brave AI Modes, Google AI Overviews, Google AI Modes

Prompt fanning is the discipline of defining the right question-space before measuring AI visibility, because every downstream “rank,” citation, mention, or competitive comparison depends on the quality and provenance of the prompts being tested. Durable programs combine first-party customer language, FAQs, support data, search behavior, community conversations, reviews, analyst judgment, clickstream signals, and LLM feedback loops to build a thematic prompt clusters set broadly enough to reflect real buyer curiosity and sufficiently specific to stay grounded in the first-party brand’s reality.

Off-the-shelf tools can help, but serious AIO, AEO, and LLM visibility measurement requires client-by-client augmentation, multi-model testing at scale, repeatable methodology, and honest reporting across noisy distributions rather than false certainty from a single metric.

The opportunity for executives is to stop chasing AI visibility screenshots and start building a defensible, provenance-backed system for understanding where the brand appears, where competitors win, and which question clusters deserve action.