Currently search engines are centralised in a fairly small number of very large data centres. This means that the a search engine’s indexing – which records the location and provides a relative importance to information on the web (and thus also to your search string) – is replicated within these data centres across several locations. This also means that the results you get from a search in the UK won’t differ so very much from the results received by someone carrying out the same search in another country.
Replication is inefficient
Researchers at Yahoo have proposed that web searching could become much faster and more efficient if the search index and additional data were distributed around the world across a larger number of smaller data centres. The authors suggest that overall search engine operating costs could be reduced by up to 15% without compromising the quality of the search results and should also speed up the search.
However, it would also mean that searches would become more localised so that the same search carried out in different countries would yield a less duplicated set of results. This is because most of your results would be coming from your friendly neighbourhood data centre. To resolve this, the authors say that the data centres must be able to communicate with each other. It is more complicated than that…of course. If your search needs to go to more than one data centre, query processing will suffer from “latency”, i.e. a slight delay.
“Distributed architecture” for search engines has met with resistance in the past because people, on the whole, don’t like inconsistency and do like having just one place from which to search all the info. on the web. Not that that happens in reality course, it’s just an illusion (ref. deep web/invisible web).
So that’s all very well, but the researchers are concerned about the future health of the mammoth data centres. It looks as if data centres are moving towards a bursting-at-the-seams scenario.
The researchers presented their paper at the The 18th ACM Conference on Information and Knowledge Management where, I gather, they were also awarded “best paper”. Well done to them, after all, this approach won’t be implemented tomorrow, it’s far from simple, they’ve done a lot of mathematics and acknowledge that further work is needed, but as we’ll all notice if a data centre suddenly goes “pop!”, it’s good to know that the experts are working on some alternative options.