00 / Open Data

The dataset of
machine memory.

30 brands·~310 scans·160 scores·5 dimensions·CC-BY 4.0

When a customer asks an LLM about a brand, the model answers from a memory we cannot see. SeenGeo is a public ledger of that memory — captured, dated, and scored, so anyone can read what the machines believe.

01 / What

What this dataset is

Every day, SeenGeo asks 30 Chinese consumer brands the same five questions across roughly thirty large language models. We log the prompts, the responses, the tokens, the latency. A separate judge model scores each response on five dimensions of brand recognition. The dataset is the raw record of that loop.

brands.csv holds the brand metadata. scans.csv holds every scan: the rendered prompt, the full response, the engine, the timestamp. scores.csv holds the per-dimension judgements with confidence. system_v1.txt is the system prompt every engine receives, unchanged.

02 / Why

Why we open it

Search engine optimisation gave us decades of public corpora — crawl logs, link graphs, ranking studies — and an entire field of researchers who could read them. Generative engine optimisation does not yet have that. Most of what we know about how LLMs describe brands lives inside private dashboards.

We want a different default. The scans are public, the scores are public, the methodology is public. If you want to argue with our judgements, you can — and you should. Bring your own judge model; our prompts are in the bundle.

03 / License

CC-BY 4.0

The dataset is released under the Creative Commons Attribution 4.0 International license. You may copy, redistribute, remix, and build upon the data — including commercially — provided you credit SeenGeo · seengeo.com and link back to this page.

The brand names, logos, and trademarks referenced inside scans belong to their owners. The dataset records third-party model output verbatim and does not endorse any claim it contains.

04 / Cite

How to cite

For academic work, please use the BibTeX below. For posts and articles, a link to seengeo.com/dataset is enough.

@dataset{seengeo2026,
  title  = {SeenGeo: An Open Dataset of LLM Brand Mirrors},
  author = {{SeenGeo}},
  year   = {2026},
  url    = {https://seengeo.com/dataset},
  note   = {CC-BY 4.0}
}

05 / Download