At the beginning of 2020, I wrote my thesis on search engine optimization ranking factors. One of the parts of this thesis was an explanation of how a search engine works.
How does indexing work? How do keywords work, and how does the SERP work?
All these topics were explained briefly. In this article, you will find the results of this research, and I will show you exactly how a search engine works.
Search engine optimization can be done in different ways. Little scientific literature is available that is still relevant. Search engines are constantly changing, and with each update of the algorithm, previously written literature becomes less relevant.
Hundreds of small updates are made to the algorithm every year.
Typically, between 5 and 10 major updates are made per year that measurably affect search results (Google Algorithm Update History, 2019). The larger updates to the algorithm are called core algorithm updates. (Sullivan, 2019)
Entrepreneurs who maintain their own sites do not have to worry about these updates. The updates are all made with one goal: To provide the best results for a particular search query. An entrepreneur who focuses on providing the user with the best possible service will, therefore, experience little inconvenience from the updates.
The entrepreneur needs to know how he makes a site that is as relevant as possible; there are various theories and models about this. Understanding the algorithm is also important so that a site can be made so that it can be properly placed in the index by a search engine.
“If a core algorithm update can wipe out your performance, you were *always* doing it wrong / not good enough. You were just getting away with it.”(Alderson, 2019, pp. 1–3)
Indexing in a search engine
If we want to know how a website can be optimized for google, it is useful to first know how Google works.
When a user searches for information on the web, he does not search the internet. Still, he searches within the index of Google (Google, nd). This index is made with software programs called spiders. A spider starts on a website with a few pages and looks on those pages to see what information can be found.
This information is stored in google’s index. Then the spider looks at the links that are present on the pages. The spider continues to the referenced pages, indexes them again, and continues with the links contained within those pages.
The spider is aided by sitemaps (Google) placed on websites by website administrators. A sitemap states which pages are present where. The sitemap helps the spider to crawl through a website efficiently.
The resulting index contains millions of pages. This index is the basis of a search.
Searches in a search engine
When a user types in the search query in google, the words that are typed in are looked at, the index looks at which pages contain the words that are typed (Shehata, S., Karray, F., & Kamel, MS 2016).
A search query may return thousands of pages where the word is present.
The search engine will then have to provide an overview of the most relevant pages. The search engine does this by testing the pages against the algorithm. The algorithm is some questions that are answered, such as:
- How many times does the word appear on the page
- Does the word occur in the title
- the word occurs in the URL
- synonyms of the word are used
- the page comes from a reliable source (domain)
- what is the page rank of the page
The latter factor, the page rank, is determined by looking at how many websites link to the page and how important those websites are (Avrachenkov, K., & Litvak, N., 2006). In addition to these example factors, there are many more factors that are included.
Google indicates that more than 200 factors in total are being looked at (Luh, CJ, Yang, SA, & Huang, TLD, 2016). Not every factor has the same weighting, so in search engine optimization, it is advisable to emphasize the heaviest weighting factors.
The ranking factors are applied to the websites that answer the initial questions. It is then determined which page is most likely to match the search query.
Display the search results
The results of the searches are displayed in the SERP. This is the overview of pages that are relevant to the search query.
Each page in the search results contains the title, URL, and a summary or snippet of the text. This ensures that the user can determine whether a certain result is relevant to him or her. You will find queries relevant to the user as the next query at the bottom of the search results.
Sometimes ads will also appear in search results. They are always above the organic search results. Additional functions recently added are:
- Relevant images
- Featured snippet (a box that provides an immediate answer)
- Section with questions that were also asked
The search algorithm has an extra function, understanding the search query. Many words have double meanings. Google needs to understand these double meanings because otherwise, irrelevant keywords may be displayed based on the wrong meaning.
An example of this is the word “late”. In the example that Google gives in the explanation about algorithms, it concerns the search questions:
- What time is it in Tokyo?
- Let the sun in your heart
- Where do I leave my old bicycle?
As can be seen, the word is used with different meanings. A time indication in the search query what time it is in a specific place. In this search query, the word is part of song lyrics and a query where the word should be left out because it is not a keyword.
When this part of the search is resolved, the search is only compared to index pages, so only the relevant pages are included in this process. The keywords that remain after the intent is determined to form the basis of the index search.
This is done based on the algorithm’s ranking factors discussed earlier.
The first factors to consider are the words on the page. Then interaction data is looked at. (Hatch, 2018)This data is the historical data of searchers. If it has previously been found that people were presented with a page that they quickly left again, the page will be displayed less quickly during the next search. This data helps estimate relevance based on user data and interaction with pages.
The algorithm also looks at what the user is looking for. A search that is clearly information related will show long articles. A search that shows that another form of information such as a video or images is a better match will yield different results.
Finally, the query’s language is looked at, and priority is given to information available in the same language.
If it is clear which pages match the search query, it is important to rank these pages. Hundreds of factors are used to determine which page is most relevant. We look at how old the content is (Soulo, 2019), the number of times the search query occurs in the text, and whether the page can be used by users, for example, by looking at the bounce rate page picking.
Finally, all pages that are misleading by mentioning a keyword too often, buying page rank, or clearly containing spam are filtered from the search results.
When a user then clicks on one of the search results, the user’s interaction is measured. This information is included as part of the algorithm. The user data tells google whether the page it recommended was relevant to the user.
How do search terms work
A search term is also called a search query. It refers to the input that users give to the search engine to arrive at certain information. A search term usually contains a word that describes what the searcher is looking for, combining it with a word that indicates the search intent. Some examples are:
- Buying shoes
- How old is Donald Trump?
- How does a search engine work?
Each of these questions asked in the search engine can be asked in different ways. When we look at the search term “how old is Donald Trump,” this becomes clear.
Search queries like
- “Donald Trump Age”,
- “Trump Age”
- “Donald Trump How Old?”
Are equal to each other.
The information required is the same. Thus, the search engine will cluster these queries, and the search results will be comparable.
Yet, in such an example, there is a difference in the pages that come up. Pages that contain the keywords “how old” score better on questions that use exactly those words.
Pages containing the word “age” score better on the query that contains the wordage.
Precisely for that reason, many online marketers look at keyword search volumes. This is called Keyword Research. Keyword research is finding keywords with sufficient search volume and where there is little competition.
In the past, keyword research has been very abused. A painter who wanted to score well wrote texts about the painter, painters, painter costs, painter price, etc. Each variant of a keyword was given its own text in which that word then appeared very often. This was seen by Google, and several algorithm updates were rolled out to counter this.
The panda update was rolled out in February 2011, followed by the hummingbird update in August 2013. These updates ensured that keyword variants were clustered. The update also ensures that pages that were written purely to fool the algorithm no longer surfaced.
Keyword research is still useful. When creating a page, it is useful to know how much traffic the page can theoretically bring in for a site owner. However, the exact data is not given by google.
Therefore, the amount of traffic that can be brought in with a keyword can be estimated but cannot be determined exactly.
In addition to keyword research, a lot of attention is paid to the second part of many searches. The search intent. There are three search intentions:
- Informational searches: the user is looking for specific information. This can be a review, knowledge, or comparison. Often informational searches start with who, what, where, why, how, or when.
- Navigational searches: Searches where the searcher is looking for a specific site or page are called navigational searches. An example could be Facebook.
- Transactional searches: When there is a purchase intention, we speak of transactional searches. Examples are buying shoes, renting a car, or cheap airline tickets. Also, searches where there is no direct purchase intention, are categorized under transactional searches. In this case, search queries contain, for example, the words: Review or compare.
The search intent indicates what type of page a user is looking for. A page of information is not valuable to a user who simply wants to buy something. Similarly, a sales page is a nuisance for someone looking for information. So when creating a page, it’s important to understand a combination of a keyword and a search intent. (White, RW, Richardson, M., & Yih, WT 2015, May).