Definition:
A search algorithm is a set of operations that a search engine follows to offer certain results to the detriment of others from the term or concept required by the user. For some years now, search engine managers have regularly changed their algorithms to improve the quality of search results and, above all, to avoid manipulations. This process demands that the optimization of web pages requires constant research and monitoring.
How do search algorithms work?
For a typical query, there are thousands, if not millions, of web pages with useful information. A definition of a basic algorithm would allow us to affirm that it is the computer processes and formulas that take the questions and turn them into answers. For example, today Google’s algorithms rely on more than 200 unique signals or “clues” that make it possible to guess what you might actually be looking for. These signals include things like terms on websites, the freshness of the content, your region, and PageRank.
The basic idea of an algorithm is to use an inverted index. This means that for each word a list of documents is maintained on the web that contains that word.
In response to a query made to the search engine, a list of the corresponding documents is obtained (this is basically done by the intersection of the lists of the corresponding query words), the processing of these documents (extraction of quality signals corresponding to the documents related to the queries) and classifying the documents (with respect to indications of the quality of the document such as its PageRank or aspects of consultation). Through this, we organize and get the best documents that match our search, ranked ten by ten.
Algorithms usually value positively aspects such as the text of the website is written in a natural language or the images are optimized to be able to carry out a fast and error-free navigation.
Some of the best known algorithms on the internet
The algorithms used by search engines, social networks and even contact pages significantly influence the daily lives of Internet users. But which are the most famous?
- Google PageRank. It is the best known algorithm in the world, since it is the one used by Google to rank the indexed pages in its search results. Contrary to what many people think about its meaning, it owes its name to Larry Page, one of the founders of the company. It was originally based on the number of pages that linked to a website, without taking into account any other factors. Google stopped showing the PageRank on March 7, 2016, so that positioning experts began to use other tools and doubts arose about the weight that this algorithm currently has for web positioning.
- Facebook’s EdgeRank. The social network also uses an algorithm to rank the results the user sees in the news feed. Criticism of the proliferation of ‘fake news’ has forced the company created by Mark Zuckerberg to make some modifications.
- High Frequency Trading (HFT). It is used to trade in short-term financial markets. He issues massive orders in a fraction of seconds, but the fact that many of them do not materialize has earned him a large number of criticisms.
Algorithm optimization
Although search engines and social networks periodically change their algorithms, there are some tricks to make obtaining results as efficient as possible:
- Distribute the tasks in thousands and thousands of machines that speed up the task.
- Store the search in memory.
- Cache the search.
- Search for the query word in the shortest document list first.
- Keep the documents in the list in reverse order to PageRank so that the search engine can stop earlier once it finds results considered of good quality.
- Maintain lists of pairs of words that often go together.
- Fragment by document ID, so that the load is evenly distributed and the intersection is done in parallel.
- Compress messages that are sent over the network.