On 20-Jun-10 19:07:21, Muenchen, Robert A (Bob) wrote:
>>I wonder if there are any capture-recapture type methodologies for
>>estimating open-source software usage? Another idea would be to
>>combine with some other known numbers, e.g. book sales, conference
>>attendance etc. You'd need personal information to link the data sets
>>together.
>>
>>Hadley
>
> This totally cracked me up! I'm envisioning going into one of our
> computer labs, tossing a net over an unsuspecting student, and then
> tagging their ear with a code that represents which stat package
> they're using. Then release and later recapture. What percent did
> we get? That's what the profs I deal with do with animals to estimate
> populations.
I've given thought in the past to the question of estimating the R
user base, and came to the conclusion that it is impossible to get
an estimate of the number of users that one could trust (or even
put anything like a margin of error to).
I think one could get a number which represented a moderately
informative lower bound -- just count the number of different email
addresses that have ever posted to the R-help list. This will of
course include people who post (or have posted) from more than one
email address, and people who tried R for a while and then dropped
it, but my feeling is that these are likely to be outweighed by the
number of people who have used R but have never posted (for example
students who are getting their R help from their instructors, people
using R in a corporate context who are discouraged from posting to
public lists, etc.).
The number of subscribers to R-help (currently about 10200) is
a definite lower bound for the number of R users, but many users
post to R-help without being subscribed.
I would expect that the total number of different email addresses
that have posted to R-help would be considerably larger than 10200.
I don't think a "Mark-Recapture" approach is feasible.
Further, I don't know how one might take account of the fact that
some installations of R (e.g. on a corporate or institutional
or departmental server) may each be used by several users.
Ted.
