Technical SEO for B2B/SaaS: Crawl, Index, and Architecture That Scales | Accord Content

Technical SEO

Technical SEO for B2B and SaaS: Crawl, Index, and Architecture That Scales

If your best content is hard to crawl or ambiguous to index, rankings stall. This guide shows how to set clean rules for bots, keep duplicates under control, publish clear language variants, handle JavaScript, and ship redirects that preserve equity.

Updated ~15 to 18 min read

Definitions

Crawl

Bots request URLs and discover new ones through links and files like sitemaps. Crawl limits depend on your site’s health and demand.

Index

What can actually appear in search results. You influence it with noindex, canonicalization, and duplication controls.

Render

How crawlers process resources and execute JavaScript before deciding what to index. Modern indexing has separate crawl and render queues.

Crawl, index, and render are separate. A page can be crawled but not indexed, or indexed later after rendering.

Why this matters

More traffic and conversions now come from mobile than desktop, so slow and heavy pages lose readers and budget. Recent global data shows mobile around sixty percent of web usage. Pair that with the Web Almanac’s finding that median mobile pages still ship multi-megabyte payloads and you can see why clean architecture pays off. Source examples: StatCounter on platform share and the HTTP Archive Web Almanac on page weight.

robots.txt essentials

robots.txt controls crawling, not indexing. It helps manage load and discovery. If you must keep a URL out of results, use noindex or access controls. See the official robots protocol and Google’s own guidance.

Starter file

User-agent: *
Allow: /$
Disallow: /wp-admin/
Disallow: /search/
Sitemap: https://example.com/sitemap_index.xml

Do not block vital resources like /wp-includes/*.js or /assets/*.css, or rendering may fail.

When to disallow

  • Endless internal search results and session parameters you do not want crawled.
  • Staging or test paths that are also password protected.
  • Explicit AI crawler opt outs if your policy requires it.

Sitemaps that scale

List only canonical, indexable URLs. Keep files within protocol limits: up to 50,000 URLs per sitemap and up to 50 MB uncompressed. For large sites, split by type or section and reference them in a sitemap index. You can add the index to robots.txt and submit it in Search Console.

XML snippet

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url><loc>https://example.com/product/analytics</loc><lastmod>2025-07-20</lastmod></url>
  <url><loc>https://example.com/blog/technical-seo-b2b-saas/</loc></url>
</urlset>

Index file

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap><loc>https://example.com/sitemaps/blog.xml</loc></sitemap>
  <sitemap><loc>https://example.com/sitemaps/product.xml</loc></sitemap>
</sitemapindex>

Canonicalization

Canonical URLs consolidate duplicates and signals. Use the <link rel=”canonical”> element on the page or the HTTP header for non-HTML documents like PDFs. Google documents how it chooses a canonical if signals conflict, so keep internal links, sitemaps, and canonicals aligned.

On-page

<link rel="canonical" href="https://example.com/solutions/analytics/" />

HTTP header

Link: <https://example.com/whitepaper>; rel="canonical"

Use headers for documents where you cannot edit HTML.

Avoid setting a canonical to a URL that is blocked by robots.txt or marked noindex. Keep one clear, self-referential canonical per page.

International SEO with hreflang

Mark language and optional region variants using ISO codes, like en-US and en-GB. Each variant must reference the others, and you can include a global x-default for fallbacks. You can add annotations via HTML or sitemaps. See Google’s localized versions docs for code rules and examples.

<link rel="alternate" href="https://example.com/pricing/" hreflang="en-US">
<link rel="alternate" href="https://example.com/fr/prix/" hreflang="fr-FR">
<link rel="alternate" href="https://example.com/" hreflang="x-default">

Pagination today

Google no longer uses rel=”next” and rel=”prev” for indexing. Use strong on-page pagination UX, keep each page self-canonical, link to a view-all version if it loads fast enough, and ensure important items are linked in fewer clicks.

Redirects and status codes

Use 301 for permanent moves and 302 for temporary routing. Return 410 for truly removed content and 451 for legally unavailable material. These codes are defined by the HTTP standard. During site moves with URL changes, set one-to-one 301s, update sitemaps, and use Google’s Change of Address tool for domain migrations.

.htaccess example

Redirect 301 /old-page https://example.com/new-page
RedirectMatch 410 ^/old-folder/?.*$

Nginx example

location = /old-page { return 301 https://example.com/new-page; }
location ^~ /deprecated/ { return 410; }

Large site moves can take weeks to months to settle. Keep redirects in place long term and monitor indexation and traffic.

JavaScript SEO

Google processes JavaScript in three phases: crawl, render, and index. Do not block required JS and CSS. Prefer server-side rendering or hybrid approaches over dynamic rendering, which Google treats as a workaround. Make sure links are real anchors with href and that critical content is present in the rendered HTML.

Link pattern

<a href="/solutions/analytics/">Analytics solution</a>

Buttons without href are usually not crawlable links.

Rendering tip

  • Avoid client-only content for primary text like headers and product copy.
  • Use lazy loading for images and hydrate below the fold.
  • Check rendered HTML with Search Console’s URL Inspection.

Site architecture patterns

Hub and spokes

Use hub pages for problems and categories, then link to deep answers, comparisons, and product fits. Keep slugs short and consistent.

Parameters under control

Let filters like ?sort= or ?page= exist, but canonicalize back to the clean URL unless the parameter changes meaningfully unique content.

Performance first

Median mobile pages still ship multi-MB payloads. Trim unused JS, subset fonts, and compress images. Faster pages help crawling and conversion.

Monitoring and dashboards

Search Console

  • Page indexing report to triage crawl and index gaps.
  • URL Inspection to view rendered HTML and live status.
  • Sitemaps report to validate parsing and freshness.

Server and logs

  • Track crawl rate, status ratios, and redirect chains.
  • Watch for spikes in 404s and 5xx responses.
  • Alert on non-200 responses for key templates.

Crawl budget

If you run a very large site, optimize internal linking and remove thin or duplicative URLs to focus crawling where it matters most.

30-60-90 plan

Days 1 to 30

  • Audit robots.txt, remove accidental resource blocks, add sitemap index.
  • Split sitemaps by type and ship clean canonicals on key templates.
  • Fix the top 20 duplicate or parameter traps.

Days 31 to 60

  • Add hreflang where you localize. Validate with HTML or sitemap annotations.
  • Replace brittle client-only content with SSR or hybrid rendering.
  • Instrument 404 and 5xx alerts and create a redirect backlog.

Days 61 to 90

  • Close remaining redirect chains and update internal links to final URLs.
  • Tune image compression and font subsetting to cut page weight.
  • Publish a living tech SEO checklist in your repo or wiki.

FAQ

Should I block AI crawlers

Decide based on policy and licensing. If you opt out, disallow their user agents in robots.txt. Understand that enforcement varies across crawlers.

How many sitemaps can I submit

As many as needed within limits. Each sitemap can hold up to 50,000 URLs and 50 MB uncompressed. Use a sitemap index to organize them.

Do I need rel next or prev

No. Keep self-canonicals on paginated pages and offer view-all if it loads quickly.

How long do migrations take

Small sites may settle in weeks. Larger domain moves can take months. Maintain 301s, update all internal links, and watch Search Console.

Tip: keep canonicals, internal links, and sitemaps in agreement. When those drift, duplicates multiply fast.