AI Readiness Checklist: 17 Things Every Website Needs in 2026

AI readinesschecklistllms.txtstructured dataMCP

March 19, 2026

    # 
      AI Readiness Checklist: 14 Things Every Website Needs in 2026
    


    
      AI agents are already deciding who gets recommended, who gets cited, and who gets
      ignored. When someone asks ChatGPT for a tool recommendation, when Perplexity
      synthesizes an answer about your industry, when a Claude-powered agent does research on
      behalf of a buyer — those systems are making decisions based on signals your
      website either has or does not have.
    


  

  
    
      The good news? Most of these signals are technical, implementable in an afternoon, and
      your competitors have not done them yet.
    


    
      This is the complete 14-point checklist. Go through it item by item, check off what you
      have, and fix what you do not.
    



    
    ## 
      The Foundation Layer (Items 1–4)
    


    
      These are the basics. If you are missing any of these, do them first — everything
      else builds on top.
    



    
      ### 
        1. llms.txt File
      


      
        **What it is:** A plain-text Markdown file at 
        `yourdomain.com/llms.txt` 
        that acts as a structured directory for AI agents — telling them what your site
        is, who it is for, and where your most important content lives.
      


      
        **Why it matters:** AI agents are noisy-signal-averse. A
        typical webpage has navigation, footers, cookie banners, and ads cluttering the actual
        content. 
        `llms.txt` gives AI
        agents a clean, unambiguous map of your site.
      


      **What to check:**


      
        - File exists at `https://yourdomain.com/llms.txt`

        - Served as `text/plain` (not HTML)

        - Contains an H1 with your site name

        - Contains a `>` blockquote description (the most-read section)

        - Links to your 10–20 most important pages with descriptions

      

    

    
      ### 
        2. robots.txt — Not Blocking AI Agents
      


      
        **What it is:** The classic 
        `robots.txt` file that
        tells crawlers what they can and cannot access.
      


      
        **Why it matters:** Many sites have overly broad 
        `Disallow` rules that
        accidentally block AI agent crawlers. Others have added blanket blocks for GPTBot or
        other AI crawlers without thinking through the consequences.
      


      **What to check:**


      
        - `robots.txt` exists at `yourdomain.com/robots.txt`

        - You are not accidentally disallowing pages you want AI agents to see

        - You have made intentional decisions about which AI crawlers to allow

      

      **Common AI crawler user agents in 2026:**


      

User-agent: GPTBot # OpenAI User-agent: ClaudeBot # Anthropic User-agent: PerplexityBot # Perplexity User-agent: GoogleBot # Google (also used for AI Overviews) User-agent: anthropic-ai # Anthropic alternative User-agent: cohere-ai # Cohere

      ### 
        3. XML Sitemap (Up to Date)
      


      
        **What it is:** A machine-readable map of all your public
        pages, submitted to search engines — and also used by AI crawlers to discover
        content.
      


      
        **Why it matters:** AI agents that crawl the web often start
        from sitemaps. An outdated or missing sitemap means they might miss your best content
        entirely.
      


      **What to check:**


      
        - Sitemap exists at `yourdomain.com/sitemap.xml`

        - Referenced in `robots.txt`

        - All important pages are included

        - `<lastmod>` dates are accurate

        - No 404 URLs included

      

    

    
      ### 
        4. Valid, Semantic HTML Structure
      


      
        **What it is:** Using the correct HTML elements for their
        intended purpose — 
        `<nav>` for
        navigation, 
        `<main>` for main
        content, 
        `<article>` for
        articles.
      


      
        **Why it matters:** AI agents often parse HTML without
        rendering JavaScript. They rely on semantic markup to distinguish your actual content
        from navigation chrome. If everything is a 
        `<div>`, they are
        guessing.
      


      **What to check:**


      
        - Main page content is wrapped in `<main>`

        - Articles/posts use `<article>`

        - Navigation uses `<nav>`

        - Headings are hierarchical (one `<h1>`, then `<h2>`, then `<h3>`)

        - Lists use `<ul>` / `<ol>` — not divs styled to look like lists

      

    

    
    ## 
      The Structured Data Layer (Items 5–8)
    


    
      Structured data is how you communicate what kind of thing your content is. AI agents use
      this extensively.
    



    
      ### 
        5. JSON-LD Structured Data — Organization
      


      
        **What it is:** A JSON-LD block in your 
        `<head>` that
        declares your organization&apos;s identity — name, URL, logo, social profiles,
        contact info.
      


      
        **Why it matters:** When an AI agent is asked &ldquo;who is
        [Company]?&rdquo; or &ldquo;what does [Company] do?&rdquo;, this is the authoritative
        source it reaches for.
      


    

    
      ### 
        6. JSON-LD Structured Data — Product or SoftwareApplication
      


      
        **What it is:** Schema markup that tells AI agents your
        product&apos;s name, category, pricing, and features in a machine-readable format.
      


      
        **Why it matters:** AI shopping agents, recommendation
        engines, and research assistants specifically look for Product and
        SoftwareApplication schema to populate answers about what products are available and
        what they cost.
      


    

    
      ### 
        7. JSON-LD on Blog Posts — Article Schema
      


      
        **What it is:** Structured data on each blog post declaring
        the author, publish date, headline, and content type.
      


      
        **Why it matters:** When AI agents cite sources or pull
        content into answers, Article schema helps them attribute content correctly and assess
        freshness.
      


    

    
      ### 
        8. FAQ Schema on Key Pages
      


      
        **What it is:** Structured FAQ markup that explicitly
        presents question-and-answer pairs from your content.
      


      
        **Why it matters:** FAQ schema maps almost directly to how AI
        assistants respond to queries. A well-structured FAQ page often gets its content pulled
        verbatim into AI-generated answers.
      


    

    
    ## 
      The Discoverability Layer (Items 9–11)
    



    
      ### 
        9. OpenGraph Tags (og: meta tags)
      


      
        **What it is:** Meta tags in your 
        `<head>` that
        define how your page appears when shared or previewed — title, description,
        image, URL.
      


      
        **Why it matters:** Many AI agents and browser tools use OG
        tags as a fallback when parsing page metadata. Missing or incorrect OG tags mean AI
        tools may pull wrong titles or descriptions for your pages.
      


    

    
      ### 
        10. Canonical URLs
      


      
        **What it is:** A 
        `<link rel="canonical">` 
        tag on every page that declares the &ldquo;official&rdquo; URL for that content.
      


      
        **Why it matters:** Duplicate content confuses AI indexers
        just like it confuses Google. If your content is accessible at multiple URLs, canonical
        tags tell crawlers which version is authoritative.
      


    

    
      ### 
        11. Machine-Readable Pricing Page
      


      
        **What it is:** A pricing page that uses clean semantic HTML,
        includes specific numbers, and has structured data — not just JavaScript-rendered
        cards with vague pricing language.
      


      
        **Why it matters:** AI agents asked &ldquo;how much does X
        cost?&rdquo; look for pricing pages. If yours is JS-only, missing numbers, or uses
        language like &ldquo;contact us for pricing&rdquo; where you could be specific, you are
        invisible to AI price comparisons.
      


      **What to check:**


      
        - Pricing is in plain HTML (not just JS-rendered)

        - Specific dollar amounts are present in the page text

        - Plan names and features are in readable list format

        - Has SoftwareApplication or PriceSpecification schema

      

    

    
    ## 
      The Content Quality Layer (Items 12–14)
    



    
      ### 
        12. Descriptive Image Alt Text
      


      
        **What it is:** Meaningful 
        `alt` attribute on
        every image that describes what the image shows.
      


      
        **Why it matters:** Multi-modal AI agents increasingly
        &ldquo;see&rdquo; images on web pages. But even text-only AI crawlers use alt text as a
        signal about what is on the page. &ldquo;screenshot.png&rdquo; tells an AI nothing.
        &ldquo;Screenshot of the Acme dashboard showing 3 active users with live cursor
        positions&rdquo; is useful.
      


    

    
      ### 
        13. Clear, Unambiguous Page Titles and Meta Descriptions
      


      
        **What it is:** Unique, descriptive 
        `<title>` and 
        `<meta name="description">` 
        tags on every page.
      


      
        **Why it matters:** These are among the first signals AI
        agents read. Vague titles like &ldquo;Home | Acme&rdquo; or &ldquo;Docs&rdquo; leave
        AI crawlers to guess what the page is about. Specific titles help AI agents index and
        route your content correctly.
      


    

    
      ### 
        14. llms-full.txt (The Content Dump)
      


      
        **What it is:** A companion to 
        `llms.txt` that contains
        the full text of your most important content, pre-processed into clean Markdown.
      


      
        **Why it matters:** Some AI systems prefer to ingest a single
        clean document over crawling dozens of pages. For documentation sites and content-heavy
        sites, 
        `llms-full.txt` can
        dramatically increase the quality with which AI agents understand your content.
      


      **What to check:**


      
        - File exists at `yourdomain.com/llms-full.txt`

        - Content is clean Markdown with no HTML cruft

        - Each section is clearly labeled with its source URL

        - Updated when major content changes

      

    

    
    ## 
      Your Score
    


    Count how many items you checked off:


    
      - **14/14** — You are fully AI-ready. You are capturing traffic others are leaving on the table.

      - **10–13** — Strong foundation. A few afternoon fixes from fully optimized.

      - **6–9** — You are being partially understood. Fix the structured data items first.

      - **0–5** — Significant opportunity. Start with llms.txt and robots.txt today.

    

  

  
    
      Don&apos;t want to audit manually?
    


    
      AgentReady automates this entire checklist. Paste your URL, get a scored report in
      seconds, and see exactly which of these 14 items you are passing and failing —
      with specific fix instructions for each one. Free to scan. Takes 30 seconds.
    


    [
      Run Your Free AI Readiness Scan
    ](/)
  

  
    [
      &larr; Back to Blog
    ](/blog)