A search engine is a tool that provides answers (search results) to questions (search queries). For example, when you type “chocolate chip cookie recipe” into Google, it shows you a page of possible answers.
Before a search engine can show results, it needs to create a master list of possible answers. This process is called “crawling”.
A crawler (also called a spider or bot) is an automated program that visits your website pages and:
- Scans all your content, images, and other media.
- Creates a stored copy (cache) of your site.
- Views your site similar to how a visitor would see it.
- Follows links to visit other pages on your site and even other websites you’ve linked to.
Note: Search engines often discover your website through links from other sites.
After crawling your site, search engines store your site’s information in a specialized database called an index. This is like a massive digital library containing data from billions of websites.
When someone searches, several things happen in just a matter of seconds:
- The search engine analyzes the search query (like “best vegan tacos”).
- It determines the likely intent behind the search.
- It ranks relevant pages from its index based on over 200 factors.
- It delivers a search engine results page (SERP) with the best matches.
The search engine considers many factors when deciding which pages to show, including:
- The searcher’s location.
- The device they’re using.
- The quality and relevance of website content.
- How other websites reference that content.
Traditional search engines like Google aren’t the only tools crawling the web anymore. AI-powered tools — including Google’s AI Overviews, ChatGPT, Perplexity, and Claude — also need to read your content so they can cite it in answers. The way they do this varies depending on the tool.
Google uses the same crawler — Googlebot — for both its regular search results and its AI features (AI Overviews and AI Mode). There is no separate “Google AI bot.” This means if Googlebot can already access your site, your content is automatically eligible to appear in Google’s AI features. No extra steps needed.
AI tools outside Google send their own crawlers to read your site:
- GPTBot is used by OpenAI (ChatGPT).
- PerplexityBot is used by Perplexity.
- ClaudeBot is used by Anthropic (Claude).
For your content to appear in these tools, their bots need access to your site. The good news: WordPress.com sites allow all the main AI search crawlers by default.
Your site has a file called robots.txt that tells crawlers what they’re allowed to access. WordPress.com configures this sensibly by default. Be cautious with any plugin that offers to “block bots” or “protect your content from AI” — unless you specifically want to opt out of AI search, these settings could prevent your content from appearing in Google’s AI Overviews or being cited by tools like ChatGPT and Perplexity.
If you do want to opt out of AI crawlers, you can do this in your website’s privacy settings where you’ll find controls for both traditional search engines and AI crawlers — but be aware that opting out of AI crawlers means your content won’t appear in AI-powered search results like ChatGPT or Google’s AI Overviews.
If you’re unsure what your current robots.txt allows, you can check it by visiting yourgroovydomain.com/robots.txt in your browser (replace “yourgroovydomain.com” with your website’s address).
Google Search Console is a free tool from Google that lets you:
- Check whether Google has found and indexed your pages.
- See which search queries are bringing visitors to your site.
- Request that Google crawl new or updated content immediately.
- Identify any crawling errors on your site.
Check out our guide on how to verify your site with Google Search Console.

Consider the topic of the page or post you selected in the previous lesson. Follow these steps:
- Perform a search on Google using terms you think people would use to find information about your topic.
- Look at the search results – are they similar to your content?
- If necessary, try different search terms until you find results that better match your content.
Goal: Find the search terms that display results similar to your content. This helps you understand which words to use in your own content to appear in similar search results.