[R] publication statistics from Web of Science

baptiste auguie ba208 at exeter.ac.uk
Thu Jan 15 10:45:51 CET 2009


For the record, I thought I'd share two findings:

First, the web of science website does seem to have some sort of API,  
as discussed here:

http://scientific.thomson.com/support/faq/webservices/
It does not seem like a trivial thing to set up though.

Second, because I could not pass the search term easily in the  
address, I looked into Google scholar instead, where a typical search  
looks like:
http://scholar.google.co.uk/scholar?as_q=plasmonics&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=&as_publication=&as_ylo=&as_yhi=1960&as_allsubj=all&hl=en&lr=

here it is trivial to create such a string with the desired keyword  
and dates, and retrieve the number of results using readLines(url) and  
grep.


Thanks to Phil Spector for some pointers.

Best wishes,

baptiste


On 14 Jan 2009, at 13:44, baptiste auguie wrote:

> Dear list,
>
> This is a bit of an off-topic question, but I'm hoping to get some
> advice from more experienced people. I've used the website "Web of
> Science" to manually collect publication counts responding to several
> keywords as a function of date, since the 1960s.
>
> http://apps.isiknowledge.com/RAMore.do?product=UA&search_mode=&SID=P1g9lFJp9@ejA6PJHKD&qid=1&ra_mode=more&ra_name=PublicationYear&db_id=UGB&viewType=raMore
>
> This is a really long and error-prone process. Once the data was
> collected I rearranged it in a form R could read (see example in the
> end), this step wasn't too bad. Finally, I plotted histograms to show
> the temporal trends.
>
> I have two questions:
>
> - Is there a package or external tool to facilitate the collection of
> data from this kind of online search tool? I could not find any public
> API for this website, although some tools like Endnote clearly access
> the database somehow. I'd be very grateful for any pointer.
>
> - I feel like the display and choice of search terms is very arbitrary
> and subjective. Any general advice on how to present this data better
> is most welcome. (I should mention that i'd rather not  involve any
> complicated statistical analysis, I only want to make sure that the
> presentation is not horribly biased).
>
>
> Best regards,
>
> baptiste
>
>
> statistics <- list(list(values=read.table(textConnection("
> date count
> 2007 600
> 2006 588
> 2008 555
> 2005 430
> 2004 418
> 2003 334
> 2002 277
> 2001 239
> 2000 226
> 1997 184
> 1999 184
> 1998 182
> 1996 129
> 1995 108
> 1994 92
> 1993 67
> 1992 53
> 1991 47
> 1990 37
> 1989 14
> 1988 11
> 1983 10
> 1987 7
> 1985 6
> 1986 6
> 1981 5
> 1984 5
> 1979 4
> 1982 4
> 2009 3
> 1971 2
> 1933 1
> 1973 1
> 1974 1
> 1977 1
> 1978 1
> 1980 1"), head=T),type=1, cumSum=4833, search="photonics"),
> list(values=read.table(textConnection("
> date count
> 2008 129
> 2007 92
> 2006 50
> 2005 26
> 2004 15
> 2003 4
> 1972 1
> 2001 1
> 2002 1"), head=T),type=1, cumSum=319, search="plasmonics"),
> list(values=read.table(textConnection("
> date count
> 2008 3207
> 2007 3105
> 2006 2666
> 2005 2323
> 2004 1910
> 2003 1552
> 2002 1372
> 2001 1292
> 2000 1095
> 1999 992
> 1998 863
> 1997 771
> 1996 643
> 1995 484
> 1993 418
> 1994 407
> 1992 345
> 1991 321
> 1990 120
> 1989 91
> 1988 82
> 1987 78
> 1981 77
> 1986 73
> 1983 72
> 1978 69
> 1979 68
> 1985 66
> 1976 63
> 1975 62
> 1980 59
> 1984 54
> 1982 52
> 1973 50
> 1977 50
> 1972 46
> 1974 43
> 1971 38
> 1969 28
> 1970 28
> 2009 26
> 1968 18
> 1967 11
> 1966 8
> 1962 5
> 1963 4
> 1900 3
> 1960 3
> 1961 3
> 1948 2
> 1912 1
> 1949 1
> 1950 1
> 1953 1
> 1954 1
> 1959 1
> 1964 1
> 1965 1"), head=T),type=1, cumSum=25226, search="plasmonics+ plasmon"),
> list(values=read.table(textConnection("
> date count
> 2008 2716
> 2007 2640
> 2006 2257
> 2005 1991
> 2004 1625
> 2003 1302
> 2002 1129
> 2001 1056
> 2000 862
> 1999 814
> 1998 650
> 1997 574
> 1996 427
> 1995 338
> 1994 272
> 1993 260
> 1991 187
> 1992 176
> 1990 62
> 1989 51
> 1981 41
> 1988 41
> 1987 36
> 1986 32
> 1983 30
> 1980 29
> 1982 28
> 1984 28
> 1985 27
> 1975 25
> 1976 23
> 2009 23
> 1973 22
> 1979 22
> 1972 15
> 1974 15
> 1977 13
> 1971 10
> 1978 10
> 1970 9
> 1968 7
> 1969 7
> 1966 1  "), head=T),type=2, cumSum=19883, search="surface plasmon"),
> list(values=read.table(textConnection("
> date count
> 2008 324
> 2007 295
> 2006 248
> 2005 220
> 2004 156
> 2003 126
> 2002 113
> 2000 86
> 2001 84
> 1996 66
> 1999 59
> 1997 53
> 1998 53
> 1993 39
> 1992 34
> 1994 29
> 1995 29
> 1991 25
> 1973 2
> 1987 2
> 1970 1
> 1972 1
> 1978 1
> 1983 1
> 1984 1
> 1989 1
> 2009 1  "), head=T),type=2, cumSum=2050, search="localised or particle
> plasmon"),
> list(values=read.table(textConnection("
> date count
> 2007 196
> 2008 165
> 2005 141
> 2006 141
> 2003 112
> 2004 109
> 2002 83
> 2001 75
> 1999 62
> 2000 51
> 1998 38
> 1997 29
> 1995 13
> 1996 11
> 1993 6
> 1992 4
> 1994 4
> 1991 2
> 2009 2
> 1990 1"), head=T),type=2, cumSum=1245, search="SPR sensor"),
> list(values=read.table(textConnection("
> date count
> 2008 290
> 2007 225
> 2006 167
> 2005 138
> 2004 101
> 2003 79
> 2001 54
> 2002 51
> 2000 42
> 1998 31
> 1999 30
> 1997 27
> 1996 25
> 1992 20
> 1995 20
> 1991 15
> 1994 14
> 1993 10
> 1973 2
> 1984 2
> 1990 2
> 2009 2
> 1963 1
> 1972 1
> 1974 1
> 1977 1
> 1978 1
> 1982 1
> 1983 1
> 1988 1
> 1989 1"), head=T), cumSum=1356,type=1,  search="light scattering  
> gold"))
>
> str(statistics)
>
> treatOne <- function(ml){
>        data.frame(ml$values, search= as.character(ml$search))
> }
> # treatOne(statistics[[1]])
>
> library(plyr)
> stats.list <- llply(statistics[-3], treatOne)
> stats.df <- do.call(rbind, stats.list)
>
> stats.melt <- melt(stats.df, id.var=c("date", "search"))
> str(stats.melt)
> # stats.melt <- within(stats.melt, counts=value)
>
> library(ggplot2)
>
> p <- ggplot(data = subset(stats.melt, date>1960 ), mapping = aes(x =
> date,y = value)) +
> facet_wrap(~search,ncol=2,  scale="free_y") +
> layer(colour="grey",  geom = c( "histogram"), stat = "identity" ) +
> scale_y_continuous("number of publications")
> p
>
>
> _____________________________
>
> Baptiste Auguié
>
> School of Physics
> University of Exeter
> Stocker Road,
> Exeter, Devon,
> EX4 4QL, UK
>
> Phone: +44 1392 264187
>
> http://newton.ex.ac.uk/research/emag
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

_____________________________

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag




More information about the R-help mailing list