Cloudflare just introduced potentially large fragmentation in how AI & content will work going forward. They control 20% of the web's traffic, so how they handle AI and Data matters — All companies will need a policy about how you want to manage it for your brand and your tech stack.
1) What Just Happened?
Cloudflare just flipped the table on AI scrapers: as of July 2025, new domains brought onto Cloudflare block “known” AI training crawlers by default, and a new Pay Per Crawl feature lets site owners charge bots that want access (via HTTP 402 Payment Required). Given Cloudflare’s scale (roughly 20% of the web!) this will impact your brand's results in AI-based searches and it gives you a new revenue lever from your content if you want it.
What Changed Specifically
- Before your options was to either block or allow AI crawlers. Cloudflare just added a new option, you can now also charge them (pay-per-crawl) each time they crawl your site's content. But only if your site is behind a Cloudflare service.
- Everyone used to be set to "Allow" all Ai Crawlers by default... instead they just changed it to "Block" on Default.
- Many managed SaaS providers use Cloudflare and you have no control over your settings — congrats you maybe just blocked the hottest new thing in search & discoverability.
Why Did They Make This Change
- Many AI bots ignore the rules brands set in robots.txt, so Cloudflare wanted a better permission system to control how the AI crawlers access sites.
- Unlike Google, which sends you traffic, AI just uses your content and doesn't send you much traffic back.
- Cloudflare is pushing the ecosystem toward opt-in permission and added a powerful new payments layer (with them at the middle) so your can monetize your content.
2) Why Marketing & Technical Leads Should Care
Get more traffic & capture audiences better.
AI‑driven referrals are still a small share but growing; at the same time, some AI crawlers consume huge volumes with little to no clickback. You’re deciding whether to nurture this as a channel or starve it.
Earn revenue & gives you negotiating power.
Default blocking plus Pay Per Crawl gives brands leverage for licensing and paid‑access conversations.
You should be able to control your data/IP governance.
Network enforcement lets you decide who can train on catalogs, docs, and community content. This is a much bigger deal for regulated or high‑value verticals.
Lower overall brand representation & outdated data in AI results.
Shut the door completely and future models may have less (or outdated) knowledge of your brand; selective access keeps you visible without donating the crown jewels.
3) Decision Framework — Your Strategic Options (Pros & Cons)
You have a few options to pick from about how your brand handles it. Depending on your tech platform (see section 4) you may not have all these options below available to you. Reach out and we can help you navigate the pros and cons, but there isn't an easy slam dunk option. Your brand will need to make a choice as other DNS providers also copy what Cloudflare is doing. Currently this only impacts 20% of the web, but we bet more will follow.
Option | When to Use | Pros | Cons |
---|---|---|---|
Accept the Default Block | New Cloudflare zones; no AI referral strategy yet | Zero effort; preserves leverage | Kills nascent AI traffic |
“AI Scrapers & Crawlers” + Managed robots.txt | You want quick, enforced control | Blocks bots that ignore robots.txt; can target only monetized sections | Need exceptions for legit search/answer bots |
Granular Allow/Block Lists | Want AI answer‑engine visibility but not broad training | Aligns access to outcomes | Ongoing rule maintenance; depends on the AI models following the rules. |
Pay Per Crawl | High‑value content; willing to experiment with licensing | New revenue stream (Cloudflare as merchant‑of‑record), 402 gating | Flat pricing/value mismatch risk; depends on crawlers supporting |
Hybrid, Set Your Own Policies | Need both discovery and protection (e.g., ecommerce catalogs, SaaS docs) | Stay visible in answer engines; monetize depth via API or 402 | Requires content engineering and policy reviews |
4) How the Big CMS Platforms Are Responding (and Your Options on Managed SaaS)
Short version: Many managed platforms sit behind Cloudflare at the provider level, so the default AI‑crawler block applies where the provider enables it. If you want enforcement under your control, you generally either (a) use the platform’s robots/AI toggles, or (b) bring your own Cloudflare zone in front of the platform via Orange‑to‑Orange (O2O) (if you tech stack will support it) to get control on your (allow/block/charge) on your domain.
WordPress
- Self‑hosted WordPress: You control your own tech stack. If you run through your own Cloudflare zone, you can use Cloudflare’s AI Scrapers & Crawlers controls, managed robots.txt, and Pay Per Crawl.
- WordPress.com (managed SaaS): Provides a “Prevent third‑party sharing” privacy toggle that updates policies/robots to opt out of AI/data sharing. This is a signal rather than network enforcement.
Shopify (Managed SaaS)
- Supports O2O so you can put your Cloudflare zone in front of Shopify’s, gaining access to your edge policies (including AI allow/block/charge).
- You can customize robots.txt (robots.txt.liquid) to explicitly allow/deny specific AI bots; without edge enforcement this remains advisory.
BigCommerce (Managed SaaS)
- Supports O2O; you can bring a Cloudflare zone to sit in front of store traffic.
- You can view/edit robots.txt via the BigCommerce UI or API for store‑level signals.
Adobe Commerce (formerly Magento)
- Adobe Commerce on Cloud (managed Saas): Uses Fastly as their CDN/WAF, so watch to see what they do in this same space. Will this copy what Cloudflare is doing? You can edit your own robots.txt file to give some you control, but can't currently implement any of the new Cloudflare stuff.
- Self‑hosted Adobe Commerce: You control your own tech stack. If you run through your own Cloudflare zone, you can use Cloudflare’s AI Scrapers & Crawlers controls, managed robots.txt, and Pay Per Crawl.
Squarespace/Webflow/Wix/Other SASS Web Builders (Managed SaaS)
- Generally provides a Crawlers setting that includes AI crawler opt‑out toggles. This updates robots.txt (signal, not enforcement). Most don't have a way to control the new Enforcement settings yet, so you getting whatever the platform allows.
Craft CMS
- Self‑hosted Craft CMS: You control your own tech stack. If you run through your own Cloudflare zone, you can use Cloudflare’s AI Scrapers & Crawlers controls, managed robots.txt, and Pay Per Crawl.
- Craft Cloud (managed SaaS): Protected by Cloudflare’s enterprise WAF by default. If you need deeper/custom controls, discuss options, you can front Craft Cloud with your own Cloudflare zone too.
Note on signals vs. enforcement.
Platform toggles & robots.txt are polite requests to the crawler; some AI bots ignore them. The new Cloudflare adds teeth giving you more control to actual enforce what you want (Allow, Block, Charge or some Hybrid option). You need policies for both for best results.
Basically, if you’re on a managed SaaS behind Cloudflare (Like Shopify, Bigcommerce or Wordpress.com), your practical options are:
- Stay platform‑only: Use the built‑in robots/AI settings (lowest friction, least control).
- Bring Your Own Cloudflare (O2O): Put your CF zone in front to get enforced policies (block/allow/charge) at the edge.
- Hybrid policy: Allow specific answer‑engine crawlers (that send traffic) while blocking bulk training crawlers. In theory this may be the best option, but the AI has to support two different crawlers...
5) Common Questions & Pushback
Will blocking hurt my SEO?
In theory, no. The standard Googlebot indexing is distinct from AI training/answer bots. But we suggest a wait a see on this.
Is robots.txt enough?
No. Multiple investigations show AI crawlers ignoring robots.txt sometimes. The new Cloudflare enforcement policies is the real area control plane.
Can bad actors bypass Pay Per Crawl?
Yes for sure via UA spoofing, proxying, etc. Cloudflare raises the bar; it doesn’t eliminate abuse.
6) Get Advice from GRAYBOX
Balancing visibility, revenue, and risk unfortunately isn’t a simple checkbox exercise. Book a 30‑minute “AI Crawler Control Audit.” We’ll map your content value tiers, baseline AI bot traffic, and design a policy with you — anything from firm blocking to a paid gateway to a smart hybrid.