[R] popular R packages
charpent at bacbuc.dyndns.org
Sun Mar 8 19:01:18 CET 2009
As far as I understand, you're telling us that having a bit of data
mining does not harm whatever the data. Your example of pop music charts
might support your point (although my ears disagree ...) but I think it
is bad policy to indulge in white-noise analysis without a well-reasoned
motive to do so. It might give bad ideas to potential "statistics
patrons" (think a bit about the sorry state of financial markets :-().
More generally, I tend to be extremely wary about over-interpretation of
belly grumbles as the Voice of the Spirit ... which is a very powerful
urge of many statisticians and statistician's clients. Data mining can
be fine for exploratory musings, but a serious study needs a model, i.
e. a set of ideas and a way to reality-stress them.
As far as I can see (but I might be nearsighted), I see no model linking
package download to package use(s). Data may or may not become available
with more or less of an effort, but I can't see the point.
Le dimanche 08 mars 2009 à 16:08 +0000, Barry Rowlingson a écrit :
> > I think the situation is worse than messy. If a client comes in with data
> > that doesn't address the question they're interested in, I think they are
> > better served to be told that, than to be given an answer that is not
> > actually valid. They should also be told how to design a study that
> > actually does address their question.
> > You (and others) have mentioned Google Analytics as a possible way to
> > address the quality of data; that's helpful. But analyzing bad data will
> > just give bad conclusions.
> As long as we say 'package Foo is the most downloaded package on
> CRAN', and not 'package Foo is the most used package for R', we can
> leave it to the user to decide if the latter conclusion follows from
> the former. In the absence of actual usage data I would think it a
> good approximation. Not that I would risk my life on it.
> Pop music charts are now based on download counts, but I wouldn't
> believe they represent the songs that are listened to the most times.
> Nor would I go so far as to believe they represent the quality of the
> Should R have a 'Would you like to tell CRAN every time you do
> library(foo) so we can do usage counts (no personal data is
> transmitted blah blah) ?'? I don't think so....
More information about the R-help