[Bioc-devel] Package download stats inflated? (specifically cummeRbund)

Hervé Pagès hpages at fhcrc.org
Thu May 24 22:15:56 CEST 2012


Hi Loyal,

The high ratio between nb of downloads and nb of unique IPs should
not be a reason to doubt that these numbers are a true representation
of the downloads. We've already seen this before. See for example the
stats for the ChIPpeakAnno package:

   http://bioconductor.org/packages/stats/bioc/ChIPpeakAnno.html

The package got downloaded 67k times in Oct/Nov 2011 from only 573
distinct IPs, so here the ratio is 117 downloads / IP.

The first time we saw this kind of massive repetitive downloads was
for the biomaRt package more than 1 year ago. We investigated it and
discovered that most downloads (> 95%) were coming from a single IP
(the IP itself was from a University somewhere in the US). We don't
know for sure why they needed to download the same package again and
again thousands of times every day for more than 20 days in a row, but
one explanation could be that they were using some kind of dumb script
to install biomaRt on each node of a big cluster. What's strange though
is that we saw the deluge of downloads for a single package (biomaRt)
and not for a subset of Bioconductor packages (it sounds to me that
the people in charge of a cluster would typically install more than
1 BioC package). But maybe they were testing a script on 1 package,
then realized they could improve it (to download each package only
once), and then used the improved script to actually deploy Bioconductor
on their cluster. Hard to know...

Anyway, because those massive repetitive downloads are possible, maybe
we should put more emphasis on the nb of distinct IPs. This number is
probably more representative of the number of users and therefore is
a better indicator of how much a package is actually used.

Cheers,
H.


On 05/23/2012 02:54 PM, lgoff at csail.mit.edu wrote:
> Hi Bioc-devel,
> I am the package maintainer for the cummeRbund package and since I'm not
> exactly sure to whom I should ask this question, I decided to post to
> the bioc-devel list.
>
> Since this is my first Bioc package I have been keenly interested in the
> download stats that are tracked and visible on the Bioconductor website,
> here:
>
> http://bioconductor.org/packages/stats/index.html
>
> Specifically, I'm noticing that the number of downloads for the
> cummeRbund package seems to far outpace the number of unique IP
> addresses downloading the package:
>
> http://bioconductor.org/packages/stats/bioc/cummeRbund.html
>
> For a few months there was a mean of between 10-20 downloads per unique
> IP address, and for the current month this is on track to be about 36
> downloads/IP (and looks to be about 8.7% of the total BioC packages
> downloaded this month so far). Looking around at several other packages,
> this does not seem to be the case as most of the packages in the top 30
> list have a ratio of about 1.8-3 downloads / IP.
>
> As ecstatic as these numbers make me, I'm certain that there is some
> underlying reason for this inflation that is not being appropriately
> represented here, but without anything else to go on, I'm not really
> sure where this is coming from. I would obviously like to have an honest
> representation of the number of downloads for my package, and I was
> hoping that someone with access to these data could help me track down
> the cause of this download inflation (unless these numbers are a true
> representation of the downloads, and then I would also very much like to
> find out more demographics if possible as well).
>
> Any and all advice or information is appreciated! Thanks to all, and a
> special thanks to everyone that helps to keep BioC such an amazing
> project. I have enjoyed the benefits of bioconductor for the past 5+
> years and I'm very happy that I can finally start to contribute back to
> this wonderful project. (Also, I look forward to meeting some of you at
> BioC 2012 this year!)
>
> Thanks in advance!
>
> Cheers,
>
> Loyal Goff
>
> (lgoff at csail.mit.edu)
> NSF Postdoctoral Fellow
> Computer Science and Artificial Intelligence Laboratory, MIT &
> Stem Cells and Regenerative Biology Department, Harvard University &
> The Broad Institute
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list