A search engine works in three stages: crawling (discovering pages on the web), indexing (storing information about those pages in a vast database), and ranking (deciding which results to show first and in what order). Google alone processes around 8.5 billion searches per day — understanding how that happens is a core part of the KS3 computing curriculum.
Stage 1: Crawling — how search engines discover web pages
Search engines use automated programs called web crawlers (also known as spiders or bots). Google's crawler is called Googlebot. A crawler starts with a list of known web addresses and visits each one, reading the page content and following every hyperlink it finds. Those new links lead to more pages, and so on — a process that eventually maps a huge proportion of the public web.
Key facts about crawling:
- Crawlers follow the rules set in a file called
robots.txtat the root of each website. This file tells crawlers which pages they may or may not visit. - A page can only be found if it is linked to from another page, or submitted directly to a search engine.
- Crawlers return to pages periodically to check for updates — a news site may be re-crawled many times per day, while a rarely updated page might be revisited only once a month.
The crawling process is fundamentally about networks — the World Wide Web is a network of interconnected documents, and the crawler traverses those connections.
Stage 2: Indexing — storing what crawlers find
Once a crawler reads a page, the search engine processes and stores information about it in a massive database called the index. The index works like a huge reference book: for any given word, the index records which pages contain that word, how often it appears, where on the page it appears (title, heading, body), and the context surrounding it.
When a new page is indexed, the search engine also analyses:
- The title tag and headings — these signal what the page is primarily about
- The body text — what topics are covered in depth
- Links pointing to the page from other sites — these suggest the page is trusted
- Page metadata — publication date, language, authorship
This stage involves data structures: the index is essentially a massive inverted index — a data structure that maps each word to a list of documents containing it, allowing the search engine to look up "photosynthesis" and instantly retrieve millions of matching pages.
Stage 3: Ranking — deciding which results to show first
When a user types a search query, the search engine does not search the live web — it searches its index. What happens in the fraction of a second between pressing Enter and seeing results is the ranking algorithm:
- The search engine retrieves all pages in the index matching the query terms
- It scores each page on hundreds of relevance signals
- It sorts the results and returns the top ten for page one
The most famous ranking signal is PageRank, developed by Google's founders Larry Page and Sergey Brin. PageRank scores a page based on how many other pages link to it, and how authoritative those linking pages are. A link from the BBC website to your page counts for far more than a link from a brand-new blog. This is analogous to academic citation: a paper cited by many respected journals is treated as more authoritative than one cited by nobody.
Other ranking signals include:
- Keyword relevance — does the page actually address what the user searched?
- Freshness — for time-sensitive queries, recently updated pages score higher
- User signals — do people click through and stay on the page, or bounce back immediately?
- Mobile-friendliness — the page should work well on a phone
- Page speed — slow-loading pages are penalised
The three stages: a summary table
| Stage | What happens | Computing concept |
|---|---|---|
| Crawling | Bots follow hyperlinks across the web, discovering pages | Networks, algorithms (graph traversal) |
| Indexing | Page content is parsed and stored in a searchable database | Data structures (inverted index), data representation |
| Ranking | Algorithm scores and sorts results for each query | Algorithms, sorting, weighting functions |
Worked example: searching "photosynthesis KS3"
Here is what happens when a Year 8 student types "photosynthesis KS3" into Google:
-
The query is parsed. Google identifies the key terms: "photosynthesis" (the biology concept) and "KS3" (a signal that the user wants school-level content, not university-level research).
-
The index is searched. In milliseconds, Google retrieves all indexed pages containing those terms — hundreds of thousands of candidates.
-
Relevance signals are evaluated. Pages from BBC Bitesize, Seneca Learning, and similar educational sites score highly because: (a) they contain both terms prominently in headings and body text; (b) they are heavily linked-to by other educational sites; (c) previous users who searched the same query clicked those pages and stayed.
-
PageRank is applied. BBC Bitesize has enormous authority — millions of external links — so its photosynthesis page ranks near the top even if a newer page has more detailed content.
-
Results are returned. The top ten results are displayed, typically in under 0.5 seconds. The entire process — from keystroke to results page — involves billions of index lookups, sorting operations, and scoring calculations running on Google's distributed servers.
How this connects to the KS3 computing curriculum
The DfE's computing programmes of study for KS3 requires students to understand how the internet works, how algorithms are applied to solve problems, and how data is structured and stored. Search engines are an ideal real-world case study:
- Algorithms — the ranking algorithm is a real-world example of an algorithm that scores, filters, and sorts data
- Data structures — the inverted index is a concrete example of how data can be organised for fast retrieval
- Networks — web crawling illustrates how the internet connects documents and how information flows across it
- Digital literacy — understanding ranking helps students critically evaluate what appears first and why
BBC Bitesize's KS3 Computing guide on how search engines work covers the same three-stage model and is a reliable revision resource aligned to the national curriculum.
Why understanding search engines matters beyond exams
Knowing how search engines work makes you a more critical digital citizen:
- Filter bubbles: ranking algorithms personalise results, which means two people searching the same query may see different results. Recognising this matters for media literacy.
- SEO and misinformation: a page can appear top of the results not because it is most accurate, but because it is most linked-to or most clicked. High ranking does not equal truth.
- Research skills: understanding that indexed pages must be linked to — and that not everything on the internet is indexed — helps students think more carefully about the sources they use.
Frequently Asked Questions about how search engines work
What is PageRank and why does it matter?
PageRank is an algorithm originally developed by Google's founders that scores web pages based on how many other pages link to them, and the authority of those linking pages. The idea is that a page linked to by many trusted sources is probably more reliable and relevant than one with few links. PageRank is still part of Google's ranking system today, though it is one of hundreds of signals and is no longer published publicly as a standalone score.
Do all search engines work the same way?
All major search engines (Google, Bing, DuckDuckGo, Yahoo) use crawling, indexing, and ranking. However, they have different crawlers, different indexes, and different ranking algorithms — which is why the same search can return different top results on different search engines. DuckDuckGo, for instance, does not personalise results based on your previous searches, making it a useful comparison when teaching about filter bubbles.
How does a search engine decide which page is the best result?
No single factor determines ranking. Google uses hundreds of signals, including keyword relevance, PageRank (link authority), page speed, mobile-friendliness, freshness, and user behaviour signals (such as how long people spend on the page after clicking). The algorithm is updated regularly — Google makes thousands of changes per year — meaning that what ranks top today may not do so tomorrow if the algorithm is updated or a better page is published.
For Socratic KS3 computing support that explains concepts like these through guided questions, see aitutors.me.