seascape models

Google controls who you cite

Do you use Google scholar to search for academic papers? Then beware, the search engine is controlling, or at least strongly influencing, who you will cite.

Google scholar has become increasingly useful and dominant in helping us find the papers we need. Among its most useful features are instant links to open access pdfs, where available, and ‘updates’ that recommend papers to you based on your own publications and the types of things you search for (see image). To get the updates you need to have a Google Scholar profile page, which most of us seem to these days. When searching you can also click directly to an authors profile page to see what other work they have published.

Google scholar recommended these two papers to me, effectively predicting the paper I am working on before it has even been submitted (They are both great reads by the way)

I recently wondered how searches on Google Scholar are ordered. If your paper comes out on top for popular key word searches, like ‘climate change’, you are much more likely to get read and therefore cited. I don’t have quantitative evidence for an effect of search rankings on cites, but there is plenty of circumstantial evidence for it.

Ordering really matters, because even diligent researchers are unlikely to look through many search pages before trying another search. Numerous studies have shown we are most likely just to click the top result on web searches. That is why there is a whole industry based around search engine optimisation.

It is not totally transparent how Google Scholar searchers are ordered. On other platforms, like Web of Science, the search algorithm is transparent. The user gets to decide on which fields are searched (e.g. just the title, or the keywords, or whole topics) and ordering is also user controlled - you can order by date or citations. Therefore, if you use Web of Science, you may bias your reading (and cites) to newer or more highly cited papers. But at least you know that bias exists.

Google Scholar’s algorithm is opaque, so we don’t know how it is biasing our citation practices. You can find some useful help on the search engine here. Of course they don’t tell you exactly how searches are ordered, otherwise people would game the system.

[Authors note: after I published this blog Rich Grenyer sent me a link to this article from 2009 where some folk tried to reverse engineer how Google Scholar ranks searches. I recommend reading it if you use GS!]

The ordered of Google scholar search results is not that obvious, this one seems to be sorted by a mix of citations and keywords

For instance, see the image for the search ‘marine reserves and climate change’. The ordering is not consistently by citations, date or appearance of keywords in the title. The top four hits go from 2003 - 2007 - 2009 then back to 2006. Their citations vary from 2718, to 833 to 188 then back up to 1324. In fact, the ordering is determined by an algorithm that is trying to guess what you are most likely to click on.

I doubt even Google’s coders could fully explain how searches are ordered. Machine learning algorithms, that try to guess what user wants are often so complex as to be opaque even to their developers. Standard Google searches rank pages based on ‘over 200 factors’. We can only assume Google Scholar does something similar. Presumably it ranks articles, in addition to relevance of your keywords, based on authors, citations, date, credibility of source, your location, articles you cite and so on.

Back to academia, do we really want citations - one of our key measures of publication ‘quality’ - to be controlled by an algorithm run by a private company?

As an example, image the location of your IP address is used in determining the order of searches (it seems to play a role in the recommendations at least - the two articles recommended to me are both studies done in my home state). This would bias you toward local studies and you may miss cross-pollination of important ideas from other regions.

I don’ think so. So while Google Scholar is convenient, I wouldn’t recommend you use it as your sole search engine. Ultimately, you want to have control over what you find and end up citing.

Contact: Chris Brown

Email Tweets YouTube Code on Github


Designed by Chris Brown. Source on Github