The Technical SEO & Site Health Checklist: How to Brief an Agency and Audit Your Own Performance

The Technical SEO & Site Health Checklist: How to Brief an Agency and Audit Your Own Performance

Search engines reward sites that load fast, render without layout shifts, and expose a clear content hierarchy. Yet many organizations treat technical SEO as a one-time fix—a crawl error cleanup, a sitemap submission—and then move on. In reality, technical SEO is a continuous process of monitoring, diagnosing, and optimizing how search bots interact with your infrastructure. This checklist is designed for two audiences: marketing leads who need to brief an SEO agency with precision, and in-house practitioners who want to run their own site health audit without relying on third-party promises. We will walk through the essential checks, from crawl budget management to Core Web Vitals, and highlight what can go wrong when shortcuts are taken.

1. Crawl Budget & Robots.txt: Controlling the Bot’s Path

Every site has a finite crawl budget—the number of URLs a search engine will crawl within a given time window. For large sites (10,000+ pages), inefficient crawling wastes resources on low-value pages (tag archives, session IDs, paginated filters) while leaving important content unindexed. The first step in any technical audit is understanding how Googlebot allocates its time across your domain.

What to check:

  • robots.txt file: Ensure it does not block critical resources (CSS, JavaScript, images) that Google needs to render the page. A common mistake is `Disallow: /wp-admin/` (fine) but accidentally blocking `/wp-content/themes/` (problematic). Use the robots.txt tester in Google Search Console.
  • Crawl rate settings: In Search Console, verify that the crawl rate is not artificially capped unless you have server bandwidth constraints. A cap set too low delays indexation of new content.
  • Log file analysis: If you have access to server logs, check which URLs Googlebot actually requests. If 60% of hits land on parameter-based filter pages, you have a crawl waste problem. An agency should provide a crawl budget report as part of the initial technical audit.
Risk alert: Do not use `noindex` directives inside robots.txt—they are not supported. Instead, use `meta robots noindex` or the `X-Robots-Tag` HTTP header. Also avoid the `Disallow: /` directive unless you intentionally want zero pages indexed (e.g., a staging environment).

2. XML Sitemaps & Index Coverage: The Blueprint for Discovery

An XML sitemap is not a magic wand for ranking, but it is the most reliable mechanism for telling search engines which URLs you consider canonical and important. A well-structured sitemap should contain only indexable URLs (200 HTTP status, no `noindex`, no canonical pointing elsewhere) and be updated whenever new content is published.

Sitemap checklist:

CheckActionTool/Method
Valid formatXML with UTF-8 encoding, no more than 50,000 URLsOnline validator or browser parse
Lastmod accuracy`lastmod` should reflect actual content changes, not CMS timestampsCompare with page publish dates
No broken URLsEvery listed URL returns 200 statusScreaming Frog or custom script
Priority & changefreqThese are hints, not commands; use sparinglyRemove if uncertain
Sitemap indexIf you have multiple sitemaps, submit a parent sitemap indexGoogle Search Console

Common mistake: Including paginated pages (e.g., `/category/page/2/`) in the sitemap. These should be `noindex` or excluded, as they dilute the crawl budget and can confuse canonical signals. A competent agency will flag this during the audit.

3. Canonical Tags & Duplicate Content: Signal the Preferred Version

Duplicate content is not a penalty—it is a confusion signal. When Google encounters identical or near-identical content on multiple URLs, it must choose which to show in search results. If it picks the wrong one, you lose traffic. Canonical tags (`rel="canonical"`) are your primary tool for telling the search engine which URL is the authoritative version.

How to audit canonical implementation:

  • Self-referencing canonicals: Every page should have a canonical tag pointing to itself unless it is a syndicated copy or a printer-friendly version. Missing self-referencing canonicals is a common oversight on e-commerce product pages.
  • Cross-domain canonicals: If you republish content on Medium or LinkedIn, use a canonical tag pointing back to your original. This passes link equity to your domain rather than splitting it.
  • Parameter handling: For URLs with tracking parameters (`?utm_source=...`), ensure the canonical tag strips them. Use Google Search Console’s URL Parameters tool to tell Google which parameters to ignore.
What can go wrong: A misconfigured canonical pointing to a 404 page or to a different product variant can cause deindexation of the original. Always test canonicals with the URL Inspection tool before deploying site-wide changes.

4. Core Web Vitals & Site Performance: The User Experience Gate

Core Web Vitals (LCP, FID/INP, CLS) became ranking factors in 2021, but their real value is in user retention. A site that loads slowly or shifts layout while the user tries to click a button will see higher bounce rates regardless of rankings. Technical SEO now includes performance optimization as a core discipline.

Key metrics to monitor:

MetricTargetWhat It Measures
Largest Contentful Paint (LCP)≤ 2.5 secondsLoading speed of the main content element
Interaction to Next Paint (INP)≤ 200 msResponsiveness to user clicks/taps
Cumulative Layout Shift (CLS)≤ 0.1Visual stability during load

Practical steps for improvement:

  • Image optimization: Serve next-gen formats (WebP, AVIF), lazy-load below-the-fold images, and set explicit width/height attributes to prevent CLS.
  • Server response time: Use a CDN, enable caching, and keep Time to First Byte (TTFB) under 800 ms. For dynamic sites, consider server-side rendering or static generation.
  • JavaScript minification: Remove unused code, defer non-critical scripts, and avoid long tasks that block the main thread.
Agency red flag: If an SEO agency proposes a performance fix without first measuring current lab data (Lighthouse) and field data (CrUX report), they are guessing. Always request a before-and-after comparison from Google Search Console’s Core Web Vitals report.

5. On-Page Optimization & Intent Mapping: Beyond Meta Tags

On-page optimization has evolved from stuffing keywords into title tags to aligning content with search intent. A page targeting "best running shoes for flat feet" must answer the user’s need to compare products, not just list features. The technical layer supports this by ensuring the page structure is crawlable and semantically clear.

On-page audit checklist:

  • Title tag & meta description: Unique for each page, within character limits, and including the primary keyword naturally. Avoid duplicate titles across product variants.
  • Heading hierarchy: One `H1` per page, followed by `H2` and `H3` for sub-sections. The `H1` should match the page’s primary topic and ideally include the target keyword.
  • Internal linking: Link to relevant pillar pages and related articles. Use descriptive anchor text (not "click here") and avoid linking to the same target multiple times on one page.
  • Schema markup: Implement structured data appropriate to the content type (Article, Product, FAQ, BreadcrumbList). Test with Google’s Rich Results Test.
Intent mapping in practice: Suppose your keyword research reveals that "how to clean suede shoes" has high informational intent. An optimized page would include a step-by-step guide, a video embed, and a table of recommended cleaning products—not a sales pitch. The technical structure should make that guide easy to scan: short paragraphs, bullet points, and clear headings.

6. Link Building & Backlink Profile: Quality Over Quantity

Link building remains a significant ranking factor, but the landscape has shifted toward editorial relevance and away from mass directory submissions. A healthy backlink profile shows a natural distribution of link types (editorial, guest posts, resource pages, mentions) and a low ratio of toxic links.

How to brief a link building campaign:

  • Define your target audience: Ask the agency to map the types of sites your ideal customers read. For a B2B SaaS product, that might be industry blogs, comparison sites, and trade publications.
  • Set quality thresholds: Prioritize relevance over arbitrary metrics. A link from a high-authority site that is unrelated to your niche may be less valuable than a link from a smaller, relevant site. Avoid links from sites with spammy characteristics.
  • Avoid PBNs and paid links: Private Blog Networks (PBNs) are sites created solely to pass link equity. Google’s algorithms detect patterns like identical IP ranges, similar themes, and unnatural anchor text distribution. If an agency promises an unusually high volume of links in a short time, they are likely using PBNs or link farms—both can lead to a manual penalty.
Backlink profile audit:
  • Toxic link detection: Use tools like Ahrefs or Majestic to check Trust Flow (TF) vs. Citation Flow (CF). A large discrepancy (e.g., CF 50, TF 5) indicates spammy links.
  • Disavow file: Only submit a disavow file if you have confirmed spam links via a manual action notice in Search Console. Proactive disavow is rarely necessary and can harm good links if done incorrectly.

7. Content Strategy & Duplicate Content Prevention

Content strategy sits at the intersection of keyword research, intent mapping, and technical execution. Without a plan, you risk producing pages that compete against each other (keyword cannibalization) or that add no unique value.

Content audit checklist:

  • Identify duplicate content: Use a tool like Screaming Frog to find pages with identical or near-identical meta descriptions, titles, or body text. For thin content (fewer than 300 words), consider merging into a longer, more authoritative page.
  • Consolidate weak pages: If you have 10 blog posts on "SEO tips for beginners," combine them into one comprehensive guide and 301-redirect the old URLs. This consolidates link equity and improves user experience.
  • Keyword cannibalization: Check if multiple pages target the same primary keyword. Use a spreadsheet to map each page to its primary and secondary keywords. If two pages target the same term, decide which is the canonical version and add a `noindex` or redirect to the other.
Agency expectation: A good content strategy deliverable includes a keyword-to-page map, a content calendar, and a plan for updating existing content (not just creating new pages). Avoid agencies that propose content without first auditing what you already have.

8. Risk Awareness: What Can Go Wrong

Technical SEO is not without risks. Poorly executed changes can cause traffic drops, deindexation, or manual penalties. Here are the most common pitfalls and how to avoid them.

RiskCauseMitigation
Traffic loss after site migrationMissing 301 redirects, changed URL structureCreate a redirect map before launch, test in staging
Manual penalty for unnatural linksPaid links, PBNs, over-optimized anchor textRegular backlink audits, avoid link schemes
Deindexation after canonical errorCanonical pointing to non-existent or wrong URLTest all canonicals with URL Inspection tool
Slow Core Web Vitals after redesignHeavy scripts, uncompressed images, no lazy loadingPerform performance budget check before go-live
Duplicate content from paginationNo `rel="next"/"prev"` or missing canonical on page 1Use `view-all` pages or implement `noindex` on paginated pages

Final word of caution: No agency can guarantee first-page rankings. Any agency that does is either lying or using black-hat techniques that will eventually catch up with you. The role of technical SEO is to remove barriers between your content and the search engine, not to manipulate the algorithm. Focus on site health, user experience, and quality content—the rankings will follow.

Russell Le

Russell Le

Senior SEO Analyst

Marcus specializes in data-driven SEO strategy and competitive analysis. He helps businesses align search performance with business goals.

Reader Comments (0)

Leave a comment