You're tracking your AI visibility across 5 LLMs. You're writing content. But you have this nagging feeling: is your site actually structured in a way that helps LLMs understand what you're an expert at?
Maybe you have three blog posts that all target the same topic without realizing it. Maybe your best content lives in an orphan page with zero internal links pointing to it. Maybe you're accidentally blocking ChatGPT's crawler in your robots.txt. These are the kinds of problems you can't spot by looking at individual pages. You need to see the whole picture.
What the site crawl does
Mentionable crawls your entire website, starting from your homepage and following every internal link. For each page, it extracts:
- Content and structure: title, meta description, H1, full heading hierarchy, word count, and the page content itself
- Schema markup: every JSON-LD block found on the page, with type identification
- Links: all internal links (where they go, what anchor text they use) and all external links (where you're sending traffic)
- Technical signals: canonical tags, robots directives, hreflang tags, Open Graph data
But the real value isn't the raw extraction. It's what happens after.
Cannibalization detection
Once every page is crawled and analyzed, Mentionable generates semantic embeddings for your content and compares them. When two pages cover highly similar topics, they get flagged as a cannibalization pair.
Each pair gets a severity rating:
- High: these pages are very likely competing for the same queries and confusing LLMs about which one represents your expertise
- Medium: significant overlap that could dilute your authority on the topic
- Potential: enough similarity to watch, but may be intentional (like a product page and a related blog post)
For each pair, you see both pages side by side with their similarity scores. This makes it easy to decide: merge them, differentiate them, or redirect one to the other.
Why does this matter for AI visibility? When an LLM tries to understand your expertise on a topic and finds three similar pages, it has to pick one. It might pick the wrong one. Or worse, it might conclude that none of them are authoritative enough compared to a competitor who has one definitive page on that topic.
Missing internal links
Internal links are how both search engines and LLMs discover the relationships between your pages. If your "ultimate guide to email marketing" doesn't link to your "email deliverability tips" post, that's a missed opportunity for both.
After crawling your site, Mentionable analyzes the content of every page and identifies pages that cover related topics but don't link to each other. For each suggestion, you get:
- The source page (where the link should be added)
- The target page (where the link should point)
- Suggested anchor text based on the target page's content
Suggestions are ranked by content relevance, so the most impactful links appear first. You get up to 100 suggestions per crawl.
Topic clusters
Mentionable groups your pages into semantic clusters using embeddings and DBSCAN clustering. This gives you a visual map of how your content organizes by topic.
You might discover that you have 12 pages about "project management" but only 2 about "time tracking," even though both topics matter equally for your business. Or you might find that pages you thought were about different topics actually cluster together, revealing hidden overlap.
Topic clusters also help you plan your content strategy. Gaps in your cluster map point to topics where you need more content. Dense clusters suggest areas where you're already strong and might want to consolidate.
AI crawler accessibility
After crawling your site, Mentionable checks three things:
robots.txt analysis: are you blocking any AI crawlers? Many site owners don't realize their robots.txt blocks GPTBot, Google-Extended, ClaudeBot, or other AI crawlers. If you're blocking them, LLMs can't access your latest content to inform their recommendations.
llms.txt detection: do you have an llms.txt file? This emerging standard helps LLMs understand your site structure and find your most important content. Mentionable checks whether it exists and what it contains.
Sitemap coverage: what percentage of your crawled pages appear in your sitemap? A low coverage percentage means some of your content isn't being explicitly shared with crawlers.
Crawl limits by plan
| Plan | Price | Max pages per crawl |
|---|---|---|
| Starter | EUR 39/month | 500 pages |
| Growth | EUR 79/month | 1,000 pages |
| Pro | EUR 149/month | 1,000 pages |
| Agency | EUR 300/month | 1,000 pages |
One crawl per month per project. For most solopreneurs and small sites, 500 pages covers the core content. Larger sites with hundreds of pages benefit from the Growth, Pro, or Agency tiers.
How the crawl pipeline works
When you start a crawl, Mentionable runs an 11-step async pipeline:
- Initiates the crawl and begins fetching pages
- Extracts content, headings, schema, and links from every page
- Stores all page data and link relationships
- Scans your robots.txt, llms.txt, and sitemap
- Generates semantic embeddings for all page content
- Resolves internal link targets and computes link metrics
- Groups pages into topic clusters
- Detects cannibalization pairs
- Generates missing link suggestions
You can watch the progress in real-time from the crawl dashboard. Each step shows a progress indicator so you know where the analysis stands.
Who benefits most
Solopreneurs who've been publishing content for months or years often discover they have cannibalization problems they never knew about. Three blog posts all targeting "best invoicing practices"? That's diluting your authority instead of building it.
Consultants can use site crawl results to show clients the structural issues holding back their AI visibility. A report showing 15 cannibalization pairs and 40 missing internal links is concrete and actionable.
Content-heavy sites with 100+ pages get the most out of clustering and missing link detection. The more content you have, the harder it is to maintain a coherent internal linking structure manually.
Try it yourself
Start your 7-day free trial and run your first site crawl. See how your content clusters, where cannibalization is hurting your authority, and which internal links you're missing. No credit card required.
Related articles
- Content Opportunities - turn crawl insights into targeted content briefs.
- AI Chat Agent - ask questions about your crawl results directly in the chat.
- Multi-LLM Tracking - track whether structural improvements change your AI visibility.