[R] Popularity of R, SAS, SPSS, Stata...

Mon Jun 21 03:01:03 CEST 2010

On 20-Jun-10 19:49:43, Hadley Wickham wrote:
>> I've given thought in the past to the question of estimating the R
>> user base, and came to the conclusion that it is impossible to get
>> an estimate of the number of users that one could trust (or even
>> put anything like a margin of error to).
> 
> I find it hard to believe that it should be harder to estimate the
> number of whales than the number of R users. Sure there's a
> definitional problem of exactly what an R user is, but there must be
> some way to come up with some useful estimates.  What about snowball
> sampling with R-help as an initial frame?
> 
> Hadley

Whales are a different kettle of fish! They are much more directly
observable, in principle, than are R-users. For one thing, a whale
has to come to the surface to breathe every so often, and if you
are in a ship nearby you can see it happen.

There have been many research ships out in the oceans in known
whale areas looking out for just that, and planning their transects
so as to be able to scale up their observed data into population
estimates. In many cases individual whales can be recognised (by
markings or by notches on the fins), enabling a kind of passive
mark-recapture.

Also, active mark-recapture is carried out, with tags being planted
into the animals and recovered later (though this was a sounder
method prior to the moratorium commercial on commercial whaling).
In addition, catch per unit effort (or observations per unit effot)
data can be used to estimate abundance. Data have been available on
Sex and Age. These days, responder beacons can be planted as tags,
and their numbers within visually observed whale groups determined.

Data from such sources, and others, can be combined with analysis
of population-dynamics models, thus improving the quality of the
estimates.

R-users are not so easy to study! For one thing, they don't all
come up to breathe, they can do that in the darkest depths and
not be seen. Their population dynamics is obscure. The big problem
with any sort of survey or "sample" of R users is that the target
population is only partially visible, and seeking responses to any
kind of survey is subject to non-reponse (including failure to target)
bias from an intangible and therefore unknown number of users.

The idea of a "snowball sample" came up when this same topic
was discussed back in 2000. Go to

https://stat.ethz.ch/pipermail/r-help/2000-June/thread.html

and find the thread (and the various side-threads) which starts
with a message

"[R] # of users of R, and biological examples of the use of R"
from  Ramon Diaz-Uriarte (Tue Jun 20 10:21:37 CEST 2000).

Searching that month (Jube 2000) of archives for the word "users"
in the Subject will find them all (and nothing else).

The snowball was proposed by John Logsdon
  "[R] # of users of R" (Wed Jun 21 11:59:34 CEST 2000)

John and I discussed the snowball idea at some length off-list,
and that is when I came to the conclusion (for reasons such as
the above) that although it had some mileage, and could provide
information supplementary to other methods, the extent of its
potential reach into the unkown was, well, unknowable ...
[with acknowledgement to Donald Rumsfeld].

In reponse to the question from Bob Muenchen as to "How did you
get the R-help figure?" (of email addresses subscribed to R-help),
since I am one of the list moderators I can log in and access the
subscriber's list.

As of today, the numbers are:

 4629 Non-digested Members of R-help
 5560 Digested Members of R-help
 (190 private members not shown)
----
10379

(A few more than the number I picked up a some days ago).

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Jun-10                                       Time: 02:00:45
------------------------------ XFMail ------------------------------