[R] R Citation rates

John Maindonald john.maindonald at anu.edu.au
Tue Aug 12 07:48:50 CEST 2008


Following some discussion with Simon Blomberg, I've done
a few Web of Science citation searches.  A topic search for
R LANG* STAT* seems to turn up most of the references to
"R: A Language and Environment for Statistical Computing"
"R Development Core Team" gets transformed into an
astonishing variety of variations.  Searching for citations
of the 1996 Ihaka and Gentleman paper (most references
up to and including 2004) turns up many fewer quirks.

What other forms of reference should be investigated?

Anyway, here are the numbers by year (there may a some
duplication.
1998: I&G: 4 15 17 39 119 276
2004: RSTAT+I&G: 68+455 433+512 1049+426 1605+410 1389+255
                                   523         945           
1475           2015          1644
cit <- c("1998" = 4, "1999" = 15, "2000" = 17, "2001" = 39, "2002" =  
119,
+          "2003" = 276,"2004" = 523,"2005" = 945,"2006" = 1475,  
"2007" = 2015,
+          "2008"=1644)

[~4550 references to R LANG* STAT*; ~2530 to I&G)

On a rate per year basis, the 2008 figure scales up to 2691.
This does not however allow for growth over the course of the year.

The number of references grew by 37% from 2006 to 2007.
On current trends, the 2007-2008 increase seems likely to
be much larger than that.

The figures probably underestimate the contribution from
Bioconductor related work.  A direct search for
Bioconductor-related papers did not turn however up
enough papers to make too much difference to the numbers.

Here are some other summary figures, for graphing using
whatever form of presentation appeals most (the second
number is for the I&G paper)

country <- c(usa=1540+903, germany=539+304, england=507+328,
                 france=468+337,  canada=345+147, australia=329+169,
                 switzerland=279+121)

subj <- c(ecology=924+349, statsANDprob=488+270,  
geneticsANDheredity=488+279,
           envScience=298+119, CSapplicatiions=269+108, zoology=267+111,
           plantSciences=250+108, biochemANDmolbio=229+200,
           mathANDcompBIO=224+143,  
biotechANDappliedmicrobiology=223+159,
           evolutionaryBIO=210+117)

There's a great deal more summary information that might
be extracted.  What is a good way, with readily available data,
to standardize the country data.

Environmental Science no doubt comes up tops because it
is a coarser grouping than many other areas.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.



More information about the R-help mailing list