[Rd] CRAN Server download statistics (Was: R Usage Statistics)

Gabor Grothendieck ggrothendieck at gmail.com
Mon Nov 23 15:51:11 CET 2009


On Mon, Nov 23, 2009 at 9:48 AM, hadley wickham <h.wickham at gmail.com> wrote:
>> Knowing what percentage of different OSes are being used is of
>> interest to package developers and would be obscured by the proposal
>> to massage the data.  I prefer to see the raw figure as is.
>
> I agree.  I was arguing that sorting by that value wasn't very useful.
>
>> Also the number of IPs are important and should not be removed in my
>> opinion since (1) it is a measure of clustering.  If a package is
>> mainly used by the courses of a few universities where the students
>> really have no choice then that seems a lot different than if its used
>> by a variety of people around the world.  Only the IPs would give any
>> clue to that.  (2) it helps to diagnose intentional distortion of the
>> figures by repeat downloads to the same machine.
>
> There is no way to tease apart (1) and (2), plus many adsl providers
> share an ip across multiple subscribers.  Number of unique IPs may
> still be useful, but it needs to be used with caution.
>
>> The one problem with sparkline graphs is that it would take a lot
>> longer for the page to load.  There already is a time series if you
>> click on the package name.
>
> Is it a time series?  It looks like a bar chart of downloads per day
> of week to me.
>

A time series is a function of time regardless of representation.



More information about the R-devel mailing list