The Technical SEO Audit & Site Health Checklist: A Practical Guide for Agency Collaboration
When an SEO agency begins work on a site, the first deliverable is rarely a keyword map or a link-building proposal. It is a technical audit—a methodical examination of how search engines discover, crawl, interpret, and store your pages. Without this foundation, every subsequent optimization effort rests on uncertain ground. A site with hidden crawl blocks, misconfigured canonical tags, or poor Core Web Vitals will not achieve the rankings its content deserves, regardless of how well-researched the keywords are. This guide provides an actionable checklist for briefing your agency on technical SEO, conducting a site health review, and ensuring that the technical layer supports—not sabotages—your broader strategy.
Understanding the Crawl Budget and Site Architecture
Search engines allocate a finite number of resources to crawl any given website. This allocation, known as crawl budget, determines how many pages Googlebot will request and process within a given timeframe. For small sites with fewer than a few thousand pages, crawl budget is rarely a constraint. For large e-commerce platforms, media sites, or enterprise portals with tens of thousands of URLs, mismanagement of crawl budget can mean that critical pages—new product launches, seasonal landing pages, or high-value content—are not indexed for weeks or months.
The primary factors influencing crawl budget include site speed, server response times, the ratio of low-value to high-value pages, and the structure of your internal links. A site that returns 404 errors on 15% of its discovered URLs is signaling to Googlebot that much of its inventory is dead weight, reducing the frequency and depth of future crawls. Similarly, a site with a deep, poorly connected navigation hierarchy may cause crawlers to waste resources traversing thin content before reaching money pages.
Action item for your agency brief: Request a crawl budget analysis as part of the initial technical audit. The agency should identify all URLs that are consuming crawl resources without contributing to indexation goals—parameterized URLs, session IDs, infinite calendar archives, or paginated filter combinations that generate hundreds of near-duplicate pages.
Core Web Vitals and Site Performance: Beyond the Lighthouse Score
Core Web Vitals—Largest Contentful Paint (LCP), Interaction to Next Paint (INP), and Cumulative Layout Shift (CLS)—are now direct ranking signals. But treating them as a simple checklist of "get green scores" misses the deeper performance issues that affect both user experience and crawl efficiency. A page may pass the LCP threshold of 2.5 seconds on a desktop audit but fail miserably on mobile under real-world 3G conditions. Similarly, a site that scores well in lab tests may degrade significantly when real user monitoring (RUM) data is analyzed, revealing that 40% of actual visitors experience poor INP.
The risk here is twofold. First, poor Core Web Vitals directly reduce rankings, particularly for mobile searches. Second, slow pages consume more of your crawl budget because Googlebot must wait longer for responses, effectively reducing the number of pages it can process per crawl session. A site with an average server response time of 1.2 seconds will be crawled far more thoroughly than one that averages 3.5 seconds.
Common pitfalls to flag in your brief:
- Over-reliance on client-side JavaScript rendering that delays LCP and INP.
- Third-party scripts (analytics, chat widgets, ad networks) that block the main thread.
- Unoptimized images that exceed 1 MB per asset, especially on product pages.
- Layout shifts caused by deferred font loading or dynamically injected banner ads.
XML Sitemaps, Robots.txt, and Canonicalization: The Triad of Indexation Control
Three technical files govern how search engines discover and index your content: the XML sitemap, the robots.txt file, and the canonical tag. Misconfiguring any one of them can lead to severe indexation issues, often without any visible error message in Search Console.
XML Sitemap: This file should list only canonical, indexable URLs—not parameterized variants, paginated pages that are already linked from the main navigation, or redirect targets. A common mistake is including all product filter combinations (e.g., `?color=red&size=large`) in the sitemap, which causes Google to waste crawl budget on near-duplicate pages. The sitemap should be submitted via Google Search Console and updated dynamically whenever new content is published or old content is removed.

Robots.txt: This file provides crawl directives, not indexation directives. A `Disallow` rule in robots.txt prevents Googlebot from crawling a page, but it does not prevent the page from appearing in search results if it is linked from elsewhere. The dangerous pattern is using `Disallow: /wp-admin/` correctly but accidentally including `Disallow: /blog/` when your blog is a primary source of organic traffic. Always test robots.txt changes using the robots.txt tester in Search Console before deploying.
Canonical Tags: The `rel=canonical` attribute tells search engines which version of a URL is the authoritative one. It is essential for managing duplicate content arising from HTTP vs. HTTPS, www vs. non-www, trailing slashes, and URL parameters. However, incorrect implementation can be catastrophic. A canonical tag pointing to a 404 page renders both the original and the canonical useless. A self-referencing canonical on a paginated page (e.g., `?page=2`) without a `rel=prev/next` setup can confuse Google about which page should rank. Ensure your agency audits every canonical tag for accuracy and consistency across all page templates.
Table: Common Technical Misconfigurations and Their Consequences
| Misconfiguration | Symptom | Consequence |
|---|---|---|
| Sitemap includes parameterized URLs | High crawl of near-duplicates | Wasted crawl budget, diluted link equity |
| Robots.txt blocks CSS/JS | Google cannot render page | Poor mobile ranking, incomplete indexation |
| Canonical tag points to 301 redirect | Link equity lost in redirect chain | Lower ranking for intended canonical |
| No canonical on paginated series | Duplicate content across pages | Possible suppression of all paginated pages |
| Mixed signals (canonical + noindex) | Google may ignore both directives | Unpredictable indexation |
On-Page Optimization and Keyword Research: Aligning Content with Intent
Technical SEO ensures that search engines can find and understand your pages. On-page optimization ensures that when they do, the page is clearly relevant to the searcher's query. This begins with keyword research—not just identifying high-volume terms, but mapping them to user intent: informational, navigational, commercial, or transactional.
A page targeting a high-volume keyword like "best running shoes" will fail if it is written as a generic product category page without comparison data, reviews, or buying guidance. The searcher's intent is commercial—they want to compare options before purchasing—not informational (e.g., "how to choose running shoes") or transactional (e.g., "buy Nike Pegasus 40"). Intent mapping is the bridge between keyword data and content strategy.
Action items for your agency brief:
- Request a keyword-to-intent mapping document that categorizes all target terms by funnel stage.
- Ensure that each piece of content has a single primary keyword and no more than three closely related secondary keywords.
- Verify that title tags, meta descriptions, H1s, and body copy align with the identified intent. A title that promises "Best Running Shoes 2025" but delivers a thin list of five products with no reviews or pricing is an intent mismatch that will hurt rankings and bounce rates.
- Ask the agency to audit existing content for keyword cannibalization—multiple pages targeting the same primary keyword, which forces Google to choose which to rank, often suppressing both.
Link Building Strategy: Profile Quality Over Quantity
Link building remains a significant ranking factor, but the nature of valuable links has shifted. A single, contextually relevant link from a high-authority domain in your industry can move the needle more than fifty low-quality directory links. The risk of aggressive link building—particularly the use of private blog networks (PBNs), paid links, or automated outreach—is that Google's manual action team can deindex your site partially or entirely. Recovery from a manual link penalty typically requires months of disavow work and a complete overhaul of the link profile.
Your agency should present a link building strategy that prioritizes:
- Editorial links: Earning mentions through original research, data studies, or expert commentary.
- Broken link building: Identifying broken resources on relevant sites and offering your content as a replacement.
- Digital PR: Creating newsworthy assets (infographics, surveys, interactive tools) that journalists and bloggers naturally reference.
- Competitor backlink analysis: Understanding where your competitors earn links and identifying gaps in your own profile.
- The agency promises a specific number of links per month without explaining the acquisition method.
- The agency offers "guaranteed" links from high-DA domains without transparency about the source.
- The agency suggests using exact-match anchor text for every link.
- The agency refuses to provide a list of target domains or outreach templates.
Running a Technical SEO Audit: The Step-by-Step Checklist

When you brief your agency on a technical audit, provide them with this checklist as a starting point. The agency should deliver a report that addresses every item below, with clear prioritization (critical, high, medium, low) and estimated effort for remediation.
- Crawl and Indexation Analysis
- Run a full site crawl using a tool like Screaming Frog or Sitebulb.
- Identify all non-indexable pages (4xx, 5xx, redirect chains, blocked by robots.txt, noindex, canonical conflicts).
- Compare the number of URLs in the sitemap vs. the number of URLs indexed in Google (via `site:` search and Search Console).
- Document crawl budget efficiency: ratio of crawled-to-indexed URLs.
- Core Web Vitals and Performance
- Collect field data from CrUX for all page types (homepage, category, product, blog, landing pages).
- Run lab tests on the 10 highest-traffic pages and the 10 lowest-performing pages.
- Identify the top three performance bottlenecks per page type (e.g., render-blocking resources, large images, slow server response).
- Recommend specific fixes with estimated impact on LCP, INP, and CLS.
- Sitemap and Robots.txt Audit
- Validate that the sitemap contains only canonical, indexable URLs.
- Check that the sitemap is referenced in robots.txt and submitted to Search Console.
- Review robots.txt for accidental blocks of critical resources (CSS, JS, images, content directories).
- Test the robots.txt against the current site structure using the Search Console tester.
- Canonical and Duplicate Content Audit
- Crawl all URLs and identify those with self-referencing canonicals vs. cross-domain canonicals.
- Flag any pages where the canonical tag points to a non-200 response.
- Identify duplicate content clusters (pages sharing more than 80% of body text) and recommend consolidation or canonicalization.
- Check for mixed signals: pages with both `noindex` and a canonical tag.
- Internal Link Structure
- Map the site's internal link graph to identify orphan pages (pages with zero internal links) and depth issues (pages requiring more than four clicks from the homepage).
- Assess the distribution of link equity: which pages receive the most internal links, and are those pages strategically important?
- Recommend a hierarchical structure that prioritizes high-value content for crawl frequency.
- Mobile Usability and Structured Data
- Test the site on mobile devices for tap targets, viewport configuration, and font sizes.
- Validate all structured data (JSON-LD) for errors using Google's Rich Results Test.
- Ensure that product, article, FAQ, and review schemas are correctly implemented and aligned with visible content.
Interpreting the Audit Report and Prioritizing Fixes
A technical audit can surface dozens—sometimes hundreds—of issues. Not all are equal. Your agency should categorize findings by impact and effort, allowing you to prioritize fixes that yield the greatest return on investment.
High-impact, low-effort fixes should be addressed immediately: fixing a misconfigured robots.txt that blocks an entire section of the site, correcting a canonical tag that points to a 404 page, or removing parameterized URLs from the sitemap. High-impact, high-effort fixes—such as migrating from a client-side rendered framework to server-side rendering to improve Core Web Vitals—require a business case and cross-team coordination. Low-impact fixes (e.g., missing alt text on a few images) can be batched into regular maintenance cycles.
Table: Priority Matrix for Technical SEO Issues
| Impact | Effort | Example | Recommended Action |
|---|---|---|---|
| High | Low | robots.txt blocks CSS | Fix immediately (15 min) |
| High | High | Migrate to SSR for LCP | Plan over 2-4 weeks |
| Medium | Low | Missing meta descriptions on 10 pages | Batch into weekly optimization |
| Medium | High | Rewrite 200 duplicate product descriptions | Schedule over 1-2 months |
| Low | Low | Missing alt text on 3 images | Fix during next content update |
| Low | High | Create sitemap for legacy blog section | Defer unless traffic justifies |
Conclusion: Building a Continuous Technical SEO Process
Technical SEO is not a one-time project. As you add new content, change site architecture, or deploy new features, the technical foundation shifts. The checklist above should be revisited quarterly, or whenever a major site update occurs (redesign, platform migration, new subdomain launch). Your agency should provide a continuous monitoring dashboard that tracks core metrics: indexation rate, crawl frequency, Core Web Vitals scores, and backlink profile health.
By treating technical SEO as an ongoing operational discipline rather than a launch-day checklist, you ensure that search engines can always find, understand, and rank your best content. The agencies that excel are those that combine rigorous technical analysis with a clear prioritization framework—and that communicate risks honestly, without promising instant results or guaranteed rankings.
For further reading on how technical audits fit into a broader SEO strategy, explore our guides on site performance optimization and content strategy alignment.

Reader Comments (0)