Sunday, 21 April 2013

The REF is a ranker

There have been a couple of posts I've seen over the last few days about the REF: the system by which the UK government decides on a part of the funding it gives to UK universities.  Perhaps unsurprisingly, given that I choose to read the posts of these bloggers, I largely agree with the opinions of telescoper and to the left of centre.  There are some interesting comments on telecoper's post.  The chief protagonist makes some interesting points, but ultimately, the suggestion that because the submitted papers have been through peer review already means that the panel don't need to do that again make no sense to me.  In my area (nuclear physics), the acceptance rate in Physical Review, which is surely the toughest specialist journal to be accepted in, is around 80%.  Elsewhere - in other journals - I expect it is even higher so that extant journal peer review is utterly irrelevant when it comes to deciding whether peer reviewed papers submitted for the REF are so-called 4*, 3*, 2* or 1*.  Peer review in journals judges correctness more than quality. To the extent that any of the classifications make any sense, all published papers, that have been through peer review, would still need to be re-reviewed in full by the panel to decide on those categories.  As Peter says, this can't really happen properly.  I also agree with his feeling that panel members who have been told not to take where papers have been published into account will still take where they are published into account.  I have heard the opinions of enough people on previous RAE panels to believe that the rules will not be adhered to in that regard.

The protagonist also pointed out something I didn't know:  That the Computer Science community had asked for Google Scholar to be included in the citations information.  As things stand, only the Scopus citation data from Elsevier will be given to panels, and Google Scholar has been investigated but has been ruled out.  I guess that wouldn't much matter if all the different metric websites agreed with each other.  But do they?

I spent a little time this afternoon looking at the papers I have published since 2008 (the REF period, basically) and the citation information from the obvious different websites.  It is not necessarily the most representative sample of papers, but at least one relevant for the REF.  They were Google Scholar, Scopus (which the REF is using), ISI, and the individual website metric measurements.  Here is the graph of total citations since publication for the 25 papers I've published since 2008.  The vertical grid lines separate each paper, with the indicated bars showing the number of citations according to each website.

What can be said about the graph (or the data from whence it came)? In a perfect world, you would hope that for each paper, the number of citations would be identical for each website.  Obviously it is not, and that is not surprising, but is it worrying?  It's certainly noticeable that some papers have wildly different numbers of citations according to different sites.  Paper 7, the highest cited paper, has about twice as many with google scholar as with Scopus or ISI.  Google Scholar is usually the most comprehensive, including e.g. citations by theses published on institutional websites, yet sometimes ISI comes out with more hits (e.g. #19 in which ISI comes out top.  Scopus is very low here).  There are worrying things.  Paper #14 is my third highest cited thing according to Google Scholar, and that is correct, as far as I can tell, but Scopus, the systems paid for by the taxpayer to do the judgement, is not aware of the paper, despite it being published in a mainstream journal (Europhysics Letters).  There are plenty of other anomalies.  It's not perhaps as clear as it might be from the figure, but there are a few examples of where drastic differences are evident from the different websites.  I'm happy to send you my more detailed breakdown, if you want to check it out.