[Rd] CRAN Server download statistics (Was: R Usage Statistics)
hadley wickham
h.wickham at gmail.com
Mon Nov 23 15:12:37 CET 2009
Hi Ian,
I've spoken with Stefan Theussl (cran maintainer) about this, and he's
concerned about the privacy implications of making the apache access
logs public. A compromise that he mentioned was having a script run
on the cran mirror that processed the log files and output summary
statistics. Then a central process could aggregate these and produce
a single overall summary.
A few comments on your current site:
* Are you just including packages downloaded interactively from within R?
* I don't think the continent from which the package was download is
of much interest. There's definitely no need to include it on the
main page.
* I'd be far more interested in changes over time. Sparklines of the
last month worth of data would be a neat addition to the main page.
* More vertical whitespace or subtle zebra striping would make it
much easier to read across rows.
* I'm also not sure about displaying the number of unique IPs. R is
used a lot in the university setting and until ipv6 comes along, many
university downloads will appear to be coming from a single ip
address.
* It's not very useful to sort by % Windows because the variance
increases as the sample size decreases so the packages with the
highest and lowest % windows are just the packages that aren't
downloaded very often. Maybe a shrunken estimate?
* Have you thought at all about how to take package dependences into account?
Hadley
On Sun, Nov 22, 2009 at 6:18 PM, Fellows, Ian <ifellows at ucsd.edu> wrote:
> Hi All,
>
> It seems that the question of how may people use (or download) R, and it's packages is one that comes up on a fairly regular basis in a variety of forums (There was also recent thread on the subject on Stack Overflow). A couple of students at UCLA (including myself), wanted to address the issue, so we set up a system to get and parse the cran.stat.ucla.edu APACHE logs every night, and display some basic statistics. Right now, we have a working sketch of a site based on one week of observations.
>
> http://neolab.stat.ucla.edu/cranstats/
>
> We would very much like to incorporate data from all CRAN mirrors, including cran.r-project.org. We would also like to set this up in a way that is minimally invasive for the site administrators. Internally, our administrator has set up a protected directory with the last couple days of cran activity. We then pull that down using curl.
>
> What would be the best and easiest way for the CRAN mirrors to share their data? Is the contact information for the administrators available anywhere?
>
>
> Thank you,
> Ian Fellows
>
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On Behalf Of Steven McKinney [smckinney at bccrc.ca]
> Sent: Thursday, November 19, 2009 2:21 PM
> To: Kevin R. Coombes; r-devel at r-project.org
> Subject: Re: [Rd] R Usage Statistics
>
> Hi Kevin,
>
> What a surprising comment from a reviewer for BMC Bioinformatics.
>
> I just did a PubMed search for "limma" and "aroma.affymetrix",
> just two methods for which I use R software regularly.
> "limma" yields 28 hits, several of which are published
> in BMC Bioinformatics. Bengtsson's aroma.affymetrix paper
> "Estimation and assessment of raw copy numbers at the single locus level."
> is already cited by 6 others.
>
> It almost seems too easy to work up lists of usage of R packages.
>
> Spotfire is an application built around S-Plus that has widespread use
> in the biopharmaceutical industry at a minimum. Vivek Ranadive's
> TIBCO company just purchased Insightful, the S-Plus company.
> (They bought Spotfire previously.)
> Mr. Ranadive does not spend money on environments that are
> not appropriate for deploying applications.
>
> You could easily cull a list of corporation names from the
> various R email listservs as well.
>
> Press back with the reviewer. Reviewers can learn new things
> and will respond to arguments with good evidence behind them.
> Good luck!
>
>
> Steven McKinney
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On Behalf Of Kevin R. Coombes [krcoombes at mdacc.tmc.edu]
> Sent: November 19, 2009 10:47 AM
> To: r-devel at r-project.org
> Subject: [Rd] R Usage Statistics
>
> Hi,
>
> I got the following comment from the reviewer of a paper (describing an
> algorithm implemented in R) that I submitted to BMC Bioinformatics:
>
> "Finally, which useful for exploratory work and some prototyping,
> neither R nor S-Plus are appropriate environments for deploying user
> applications that would receive much use."
>
> I can certainly respond by pointing out that CRAN contains more than
> 2000 packages and Bioconductor contains more than 350. However, does
> anyone have statistics on how often R (and possibly some R packages) are
> downloaded, or on how many people actually use R?
>
> Thanks,
> Kevin
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
http://had.co.nz/
More information about the R-devel
mailing list