OPEN OR CLOSED
The Decision You Didn't Know You Were Making
And why every option comes with a cost
Right now, publishers are making one of the most important decisions about their future — without realising they're making it.
And in doing so, they're deciding what the rest of us get to see — and what we don't.
It's not happening in strategy meetings. Not in boardrooms. It's happening in files most people have never heard of: robots.txt, ai.txt, llms.txt, JSON-LD.
I've worked with publishers for a long time. And like most people in this industry, I've watched the traffic change. It didn't disappear — people didn't stop being curious about the world. But it went somewhere else. To AI. People started asking questions and getting answers directly, without ever arriving at a publisher's website.
I wanted to understand what that means in practice. How do you become the answer? What does it take? And what are publishers around the world actually doing about this right now?
So over the last four months I've been doing the research — looking at close to 5,000 publisher domains across 99 countries, analysing whether they have AI policies, how they're configured, whether their doors are open or closed to AI systems, and whether any of it looks like a deliberate choice or just a default nobody has revisited.
What I found was interesting. And a little uncomfortable.
—
For a long time, the game was clear. You Googled. As a publisher, if you were on page one, you were in good shape. It was hard to get there — complicated, competitive — but to a certain degree you could understand it. You could see where you stood. You could look at your analytics and watch the traffic move.
There is no page one anymore.
There is just the answer. When someone asks an AI assistant about your coverage area — your city, your beat, the topic you've spent years building expertise on — you don't know if you're mentioned. You get no notification. You see no analytics. Either you're in the answer or you're not.
The rules exist. They're just invisible to most people working in publishing right now.
But this pattern — the shape of it — we've seen it before. With music, with books, with restaurants. Whether it was Spotify or Amazon or Just Eat. Whether it was Netflix and Blockbuster, or Fujifilm and Kodak. Every time, a platform emerged that was genuinely better for the audience — easier, faster, everything in one place. And every time, the creators who had built their value on a direct relationship with their audience found that relationship had quietly moved somewhere else.
The platform didn't win because it was better than any single publisher or artist or restaurant. It won because it removed the need to choose. One subscription, all the music. One search, all the answers.
That's what's happening again. And some people are showing up for it. And some people are not.
We've seen this before. We've just never seen it move this fast — or rely so heavily on rules most people never see.
This is where robots.txt, ai.txt, llms.txt, and JSON-LD come in.
These files are the closest thing we have that looks like rules for AI visibility. They tell AI systems who a publisher is, what they cover, whether they can be crawled, and how they want to be cited. Robots.txt has existed for decades but was never designed with AI in mind — it is now the primary instrument publishers use to block AI crawlers. ai.txt is newer, built specifically for AI systems, and lets publishers make formal declarations about their stance. It is still an emerging standard, but it gives publishers a language, AI crawlers are starting to understand. llms.txt tells AI systems what a publisher considers authoritative. It is a proposed standard, originally put forward by Jeremy Howard, and is already being adopted by a growing number of sites and tools. JSON-LD structured data helps AI systems understand the context and authorship behind content.
What my research shows is that most publishers aren't actively using these tools — or if they are, it looks more like an inherited configuration than a deliberate decision.
—
Of close to 5,000 publishers scanned across 99 countries: 21% block all AI crawlers. 4% block some. 47% allow all. And 28% have set no policy at all — more than one in four publishers hasn't made a choice. Not open. Not closed. Just absent.
The country picture is sharper. Finland and Norway block at 67%, Sweden at 66% — three times the global average. In the United States, roughly half of major publishers block all AI crawlers.
These numbers look like decisions. But if you look more closely what you would mostly see is default behavior. Configurations set years ago, before anyone was thinking about AI. CMS settings that block bots as a general rule. Choices inherited, not made.
Most publishers have an AI policy. They just didn't choose it.
—
So should you open the door or close it? Is that even the right question?
Here's what both directions carry: a real cost.
Before you decide, there are three questions you cannot afford to leave as a default:
- What is our principle on AI: do we want to block, license, or be open on our own terms?
- Do our technical settings — robots.txt, ai.txt, llms.txt and basic JSON-LD — actually reflect that position?
- Where, and how, have we documented this as an institutional decision rather than a CMS checkbox?
If you close — you block AI crawlers — you're protecting your content from being used in future training without a license. That matters. But blocking doesn't reach back to what's already in the model. And it doesn't shape how the AI currently represents you. When someone asks about your coverage area, the AI reaches for whoever is available. If that isn't you, it's someone else — another publisher, an aggregator, a content farm. Reddit. By opting out, you aren't just protecting your IP; you are effectively removing your brand from the primary discovery engine of the next generation.
Blocking protects your content from being used in training. It does not protect your authority from being replaced.
If you open — if AI systems can access and cite your content — you may lose the visit. The reader gets the answer. Your counter doesn't move. For publishers heavily reliant on the ads.txt model, this is ‘cannibalization’ — you are providing the data that trains the very tool that replaces your ad-supported pageviews. But being open isn't enough on its own either. An AI system that can crawl your site but finds no structured data, no authorship signals, no sense of who you are or what you cover, will represent you badly or not at all.
You can be open and still be invisible.
—
Open or closed is the wrong frame.
The real question is whether your current position is a choice or a default. In my research across 5,000 domains, I found that for most, it is a default — a technical setting left over from a different era. Whether it's backed by the infrastructure that makes it mean something. Blocking without a formal declaration is a technical instruction, not an institutional record. Being accessible without structured data is presence without identity. If your site can be crawled but has no JSON-LD for your organisation, authors and articles, the AI sees “a page with text” — not “this is the local authority on crime in X city”.
The publishers who navigate this well, won't be the ones who got the binary right. They'll be the ones who decided deliberately, understood the cost, and built what it takes to be found, understood, and cited correctly.
The window for that is still open. The pattern — from music, from books, from every time a platform stepped between a creator and their audience — says it doesn't stay open long. For publishers still heavily reliant on the ads.txt model, the window is even narrower; every day of ‘default’ settings is a day of uncompensated data loss.
The strategy question — what position do we want to hold — matters more right now than the technical one.
—