Part of our Performance & Scalability series
Read the complete guideA technical SEO audit answers one question: is anything about this site's infrastructure preventing content that deserves to rank from ranking? In 2026 that question spans classic crawlability, the JavaScript rendering gap, Core Web Vitals field data, and a new layer — whether AI search crawlers can retrieve your content at all.
This is the literal 47-point checklist our team works through on every client engagement, in the order we run it, with the decision rule for each check. It is sequenced by dependency: crawl problems invalidate indexation analysis, indexation problems invalidate content analysis. Run it top to bottom, log every finding with severity (Critical / High / Medium), and fix Critical items before touching anything cosmetic.
Key Takeaways
- Audit in dependency order: crawlability → indexation → duplication → rendering → performance → enhancement layers; a Critical finding upstream changes everything downstream
- Server log analysis is the only ground truth for how Googlebot actually spends its budget — crawl simulators approximate it, logs prove it
- The most common Critical findings in 2026: faceted-navigation crawl traps, JavaScript-only content and links, hreflang reciprocity failures, and staging environments leaking into the index
- Core Web Vitals must be judged on CrUX field data per template, not Lighthouse lab runs — only field data feeds rankings
- Structured data belongs in the audit for rich-result eligibility — not as an AI-citation tactic; large-scale studies show schema does not drive LLM citations
- AI retrievability is now a standard audit section: deliberate AI-crawler policy in robots.txt and server-side rendered content that retrieval bots (which execute little JavaScript) can read
- An audit without prioritized, owner-assigned remediation tickets is a PDF, not an audit
Section A — Crawlability and Crawl Budget (Checks 1–8)
1. robots.txt is valid, accessible, and intentional. Returns 200 at the root, parses cleanly, declares the sitemap, and every Disallow line has a known reason. We diff it against the last known version — silent robots.txt changes cause more outages than any other single file.
2. XML sitemaps are clean and honest. Every URL returns 200, is canonical, and is indexable. Sitemaps containing redirects, 404s, or noindexed URLs train Google to distrust them. Large sites: segmented sitemaps per section so coverage problems can be localized.
3. Server logs show Googlebot spending budget on money pages. We pull 30 days of logs and compute the share of Googlebot hits landing on indexable, revenue-relevant URLs. Under ~70% means budget is leaking somewhere — and the logs say where.
4. Parameter and faceted-navigation traps are controlled. Filter, sort, and session parameters either canonicalized and uncrawled, or deliberately indexable with unique content. The combinatorial URL explosion on ecommerce and listing sites is the most common Critical finding we log.
5. No redirect chains or loops on important paths. Every redirect resolves in one hop. Chains waste budget, leak equity, and break silently when one link in the chain changes.
6. Internal 404s and broken links are near zero. A full crawl lists every internal link to a 4xx target. A handful is hygiene; hundreds is an architecture problem.
7. No orphan pages among pages that matter. Cross-reference the crawl (link graph) against sitemaps and analytics. Pages with traffic or revenue but no internal links get linked; orphans with neither get judged for removal.
8. Server responds fast and reliably to bots. TTFB under ~600ms at origin, 5xx rate effectively zero in logs, no bot-specific throttling or WAF rules accidentally serving 403s to Googlebot.
Section B — Indexation (Checks 9–14)
9. Indexed count roughly matches intended count. Search Console page indexing vs your own canonical URL inventory. A large gap in either direction is the headline finding to explain.
10. noindex audit — nothing valuable is excluded. Crawl for meta robots and X-Robots-Tag headers; every noindex must be intentional. CMS template changes regularly noindex whole sections without anyone noticing.
11. Index bloat identified. Thin tag pages, internal search results, empty categories, printer views, expired listings sitting in the index dilute quality signals. Inventory them and decide: improve, consolidate, or remove.
12. "Crawled — currently not indexed" patterns diagnosed. This GSC bucket, read at scale, is Google's quality verdict by template. Clusters of one page type here point at thin or duplicative templates, not random bad luck.
13. Pagination is crawlable and consistent. Paginated series self-canonicalize (page 2 does not canonical to page 1), links are real anchors, and "view all" or load-more patterns have crawlable fallbacks.
14. Staging, dev, and test environments are out of the index. site: queries and Performance-report host data for staging subdomains. Leaked staging environments duplicate the entire site — authentication, not just noindex, is the fix.
Section C — Canonicals and Duplication (Checks 15–19)
15. Every indexable page has a self-referencing canonical. Absolute URL, one per page, matching the served protocol and host.
16. One canonical host, one hop. http→https, www/non-www, trailing-slash and case variants all 301 to a single form — and never via chained redirects or split server configs.
17. Parameterized duplicates canonicalize to the clean URL. UTM, sort, and pagination-irrelevant parameters never produce competing indexed versions.
18. Internal links agree with canonicals. If canonicals say /products/x but the site links /collections/y/products/x, Google receives contradictory signals daily. Make internal links point at canonical forms.
19. Cross-domain and syndication duplication handled. Content republished to partners, marketplaces, or international sister sites carries canonical or noindex agreements — otherwise the bigger domain wins your rankings with your content.
Section D — International and hreflang (Checks 20–23)
20. hreflang annotations are reciprocal. Every alternate must link back. Non-reciprocal pairs are ignored entirely — this single failure invalidates most hreflang deployments we audit.
21. x-default is declared for every URL group, pointing at the global or language-selector version.
22. Language–region codes are valid ISO formats (en-GB, not en-UK; zh-Hans where script matters), and the annotated page actually serves that language.
23. hreflang and canonicals do not contradict. A page that canonicals to another URL must not appear in hreflang sets; only canonical URLs participate. Conflicts here silently disable international targeting — on multilingual sites this section alone justifies a dedicated multilingual SEO review.
Section E — Rendering and JavaScript (Checks 24–27)
24. Primary content exists in the initial HTML. Diff raw HTML against rendered DOM on key templates. Content that appears only after client-side rendering is delayed for Google and invisible to most AI retrieval crawlers.
25. All important links are real anchor tags with href. Buttons with onclick navigation, router links without href, and div-based menus are uncrawlable. The link graph must exist in HTML.
26. Lazy-loading and infinite scroll have crawlable fallbacks. Content below lazy boundaries must be reachable via paginated URLs; native loading attributes over IntersectionObserver-only patterns for images that matter.
27. Rendering resources are not blocked. robots.txt does not disallow the JS/CSS bundles Google needs to render the page; render in the URL Inspection tool matches what users see.
Section F — Core Web Vitals and Performance (Checks 28–33)
28. LCP field data (CrUX) is Good per template — under 2.5s at p75 mobile for homepage, category, product/article templates judged separately. Lab scores diagnose; field data decides.
29. INP field data is under 200ms. The 2026 problem metric — long main-thread tasks from third-party scripts are the usual cause; the Performance panel during a real interaction names the offender.
30. CLS field data is under 0.1. Almost always: images without reserved dimensions, late-injected banners, font swap.
31. The actual LCP element is optimized. Identified per template, preloaded, served at responsive size, never lazy-loaded, fetchpriority set.
32. JavaScript weight and third-party scripts audited. Inventory every script with its owner and blocking cost; remove the dead, defer the deferrable. On platform sites (Shopify, WordPress) this is overwhelmingly an app/plugin audit.
33. Images sized and formatted correctly. Responsive srcset serving appropriate widths, modern formats via CDN negotiation, no multi-megabyte originals shipped to phones.
Section G — Structured Data (Checks 34–37)
34. Key templates carry valid JSON-LD appropriate to type — Product/Offer, Article, LocalBusiness, FAQ where genuinely present — validated against live URLs, not staging.
35. Schema matches visible content. Prices, ratings, availability in markup must equal what users see; mismatches risk manual actions and Merchant feed disapprovals.
36. Rich-result eligibility confirmed in Search Console — enhancement reports clean, no items in error sliding toward zero impressions.
37. Organization and breadcrumb schema present site-wide, with consistent name, logo, and sameAs profile links — this is entity hygiene, useful for knowledge panels and disambiguation. (We keep schema in scope for rich results; we do not sell it as an AI-citation lever — large-scale studies show it does not drive LLM citations.)
Section H — Security, Mobile, and Infrastructure (Checks 38–41)
38. HTTPS everywhere, zero mixed content. Mixed-content warnings on checkout or lead forms are conversion killers as much as SEO issues.
39. HSTS and a single secure redirect path. Plus valid certificates across all hostnames, including the forgotten www or regional variants.
40. Mobile parity. Mobile-first indexing means the mobile DOM is the site: identical content, identical structured data, working viewport, tap targets usable.
41. Soft 404s and error semantics correct. Empty results pages, expired listings, and "not found" templates return the status code that matches their meaning — soft 404s pollute the index and the crawl.
Section I — Architecture and Internal Linking (Checks 42–45)
42. Money pages within three clicks of the homepage. Click-depth report from the crawl; revenue pages buried five levels deep get structurally re-linked.
43. Descriptive anchor text. "Click here" and bare URLs replaced with anchors that state the target's topic — internal anchors remain one of the clearest relevance signals you fully control.
44. Navigation and footer link hygiene. No 50-link footers diluting every page, no nav links to redirected or dead URLs.
45. Topical clusters are wired. Guides link to the commercial pages they support and vice versa; hub pages exist for major topics. Authority flows along links — architecture decides where it pools.
Section J — AI-Era Retrievability (Checks 46–47)
46. AI crawler policy is deliberate. GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended each explicitly allowed or disallowed as a business decision — not by accident or by a WAF default. Blocking retrieval bots removes you from answer engines your buyers already use.
47. Key pages are extractable by retrieval systems. Server-side rendered, answer-first structure on commercial and FAQ content, current dates on refreshed pages. AI search rewards pages a model can quote; this check closes the audit by asking whether yours qualify.
From Checklist to Outcome
Two things separate an audit that changes traffic from a PDF that decorates a drive:
| Audit deliverable | Decoration version | Working version |
|---|---|---|
| Findings | Screenshot dump | Severity-rated, with evidence (logs, crawls, GSC exports) |
| Recommendations | "Improve Core Web Vitals" | Ticket-level fixes with owner, effort, and expected impact |
| Sequencing | Alphabetical | Dependency-ordered: Critical crawl/index issues first |
| Follow-through | None | Re-crawl and GSC verification 4–8 weeks post-fix |
Special cases deserve their own pass: migrations and replatforms (where redirect mapping quality decides whether you keep your organic revenue — see our SEO migration service) and multilingual builds, where Sections C and D interact in ways that quietly disable entire country targets.
Frequently Asked Questions
How long does a proper technical SEO audit take?
For a typical business site (1,000–50,000 URLs): one to two weeks including log analysis, full crawl, GSC forensics, and a prioritized remediation plan. Enterprise sites with millions of URLs run four to six weeks. Anything delivered in 48 hours is a tool export with a logo on it — the value is in the cross-referencing (logs vs crawl vs index), which cannot be automated end to end.
How often should I run a technical SEO audit?
A full audit annually, plus triggered audits after any migration, replatform, major template change, or unexplained traffic drop. Between full audits, monitor continuously: Search Console indexing and CWV reports monthly, sitemap health weekly, and robots.txt under change control so modifications are reviewed like code.
What are the most common critical issues found in technical audits?
In our 2025–2026 engagements, the recurring Critical findings are: faceted-navigation crawl traps consuming the majority of crawl budget, important content or links rendered only client-side in JavaScript, non-reciprocal hreflang silently disabling international targeting, staging environments indexed in parallel with production, and redirect chains left over from one or more previous migrations.
Do I need server log analysis, or is a crawler enough?
A crawler shows what Googlebot could do; logs show what it actually does. For sites under ~1,000 URLs, GSC crawl stats plus a crawl is usually sufficient. Above that — and always for ecommerce with facets — logs are the difference between guessing and knowing where the crawl budget goes. It is the single audit component clients most often skip and the one that most often contains the headline finding.
Do Core Web Vitals really affect rankings?
Yes, as a confirmed but modest signal — a tiebreaker among relevance-equal competitors, not a substitute for content and authority. The practical 2026 guidance: get all three metrics (LCP, INP, CLS) into the Good range on field data per template, then stop. Chasing perfect lab scores past green field thresholds has no ranking payoff.
How does AI search change what a technical audit covers?
Two additions, both now standard in our checklist: a deliberate AI-crawler access policy (the retrieval bots behind ChatGPT, Perplexity, and Google's AI features must be allowed if you want to appear in their answers), and retrievability of key content — AI crawlers execute little JavaScript, so server-side rendering and answer-first page structure determine whether your pages can be quoted. What has not changed: schema does not buy AI citations, and llms.txt is not used by major engines — audit budget belongs elsewhere.
Next Steps
Run the 47 checks in order, log severities, fix Critical items first, and verify in GSC and logs four to eight weeks later. That loop — not any individual trick — is what technical SEO is.
If you would rather have it done with evidence attached, ECOSIRE's technical SEO audit service executes this exact checklist: full crawl, server-log analysis, GSC forensics, and a dependency-ordered remediation plan your developers can ship from directly.
Request a technical audit — we will tell you which of the 47 checks your site fails before you commit to anything.
Written by
ECOSIRE TeamTechnical Writing
The ECOSIRE technical writing team covers Odoo ERP, Shopify eCommerce, AI agents, Power BI analytics, GoHighLevel automation, and enterprise software best practices. Our guides help businesses make informed technology decisions.
ECOSIRE
Grow Your Business with ECOSIRE
Enterprise solutions across ERP, eCommerce, AI, analytics, and automation.
Related Articles
AI Search Optimization (GEO) for Businesses: How to Get Cited by ChatGPT, Perplexity & Google AI in 2026
Evidence-based GEO guide for 2026 — what actually earns citations in ChatGPT, Perplexity, and Google AI Overviews, what is myth (llms.txt, schema), and why.
Ecommerce SEO in 2026: The Complete Guide to Ranking Product and Category Pages
How ecommerce SEO works in 2026 — category page architecture, product schema, faceted navigation, internal linking, and AI search visibility for online stores.
Local SEO in 2026: Google Business Profile, Reviews, and the AI Local Pack
How local SEO works in 2026 — Google Business Profile optimization, review velocity, local landing pages, and staying visible as AI reshapes the local pack.
More from Performance & Scalability
Shopify Speed Optimization: A Technical Checklist That Actually Moves Core Web Vitals (2026)
A field-tested Shopify speed checklist for 2026 — what actually improves LCP, INP, and CLS on real stores, what wastes time, and how to audit apps and themes.
Odoo 19 HR: Skills Matrix, Career Plans, Performance Cycles
Odoo 19 HR upgrade: native skills matrix, career path planning, performance review cycles, 9-box grid, succession planning, HRIS integration.
Odoo 19 Performance Benchmarks: PostgreSQL 17 Tuning Numbers
Real-world Odoo 19 performance benchmarks: web client speed, ORM throughput, PG17 tuning settings, connection pooling, worker counts, scaling thresholds.
OpenClaw Cost Optimization and Token Efficiency at Scale
OpenClaw token cost optimization: prompt caching, model routing, response caching, batch APIs, and per-tenant cost guardrails for production agents.
Power BI Incremental Refresh for Tables Over 10 Million Rows
Power BI Incremental Refresh playbook for 10M+ row tables: partition design, RangeStart/RangeEnd, refresh policies, query folding, and DirectQuery hybrids.
Webhook Debugging and Monitoring: The Complete Troubleshooting Guide
Master webhook debugging with this complete guide covering failure patterns, debugging tools, retry strategies, monitoring dashboards, and security best practices.