[R] Size of R user base

Richard A. O'Keefe ok at cs.otago.ac.nz
Wed Apr 21 03:55:40 CEST 2004


"Philippe Grosjean" <phgrosjean at sciviews.org> wrote:
	A last comment/question:  would it be possible to add some code
	in R that does the following:
	["calls home" to say that it is being used/asks for updates/&c]

There are all sorts of things the R developers might like to know about how
it is used.  There are also all sorts of reasons why they shouldn't do
anything like that.  Any habitual reader of comp.risks can think of more
reasons than I care to spend typing up.  I'll mention just one:  a number
of Microsoft users got hit with unexpectedly large phone bills a while back.
Their software was "calling home" *without* asking the user's permission or
even telling the user, and Microsoft's normal lines were out of service, so
normal full cost calls were made.

As far as informing the user that there is an update,
	
	An update is available at http://cran.r-project.org.
	
the only *really* useful information here is the URL, and that can be
displayed without calling home.  If one's R installation is more than a
couple of months old there is almost certainly an update.  It would suffice
to say

	You can check for updates by visiting http://cran.r-project.org
	or by using the check.CRAN.for.updates function.

Another reason for not calling home, of course, is that R already takes
quite long enough to start up, thank you very much.  (And that doesn't
count opening a graphics window, just time to first prompt.)

	Of course, this will only work with computers connected to the
	internet,... but at least, it could be one way to evaluate the
	number of R users.  Would that be an infringment of Open Source,
	or any other rule of freedom?  I don't know, but it does seem to
	be quite widespread (at least for commercial software).

Yes, and it's an unwarrnated invasion of privacy there.  The fact that
some be****ed program is sending who knows what information about me to
who knows where without my say-so is one big reason why I avoid commercial
software (read: Windows software; none of the commercial software I use on
my Solaris box does this).

	so, why an Open Source software would not be able to monitor the
	number of users?
	
Because even if R *did* do the unwise and unforgivable, we STILL could not
know the number of users!  You would, to start with, only know about copies
of R on machines that were connected to the internet and allowed this kind
of traffic through their firewall.  Now I have R on two old Macs at home,
and you'd never hear about those.  Worse, here at work I have accounts on
a G3 Mac, a G4 Mac, three different UltraSPARCs, three Alphas, and a couple
of Linux boxes.  That's about 10 different accounts.  (How do I keep track
of 10 different passwords?  Easy:  every so often I ask our sysadmin to give
me new passwords on the machines I use less often because I've forgotten
them.)  How is your monitoring site to know that these 10 users are really
the same person?  And when I fire up R on a student's Linux box to demonstrate
a point (to a student who _isn't_ an R user), how is the monitoring site to
know that it's really me, not the student, so that the number of "users"
should not be incremented?

In fact, the more I think about it, the more it seems to me that "the
number of users" is not a well defined concept.  For a commercial system,
you can count the number of licences sold, and that means something
pretty clear, because each licence is money in your pocket.  For a system
like R, the amount of traffic on the mailing list is reasonably well
defined and of interest because it's stuff that the maintainers have to at
least glance at, so it directly affects their lives.  If you are thinking
about popularity contests, bear in mind that a Microsoft staffer wrote an
article "Evangelism is WAR" in which he explicitly stated that other
software producers are the "enemy" and users are "pawns"; do you really
want to get into that kind of contest?  If you're concerned about mind-
share rather than market-share, I have talked a data-mining student into
at least looking at R.  She has tried it.  She's doing a literature survey
first.  Is she an "R user" yet?  If she uses it for a month, and drops it
for a year, is she still an "R user"?  I use R in bursts myself; intensely
for a couple of days, then stop and think about things and do other work for
a week or so, then come back.  When, precisely, am I an "R user", and when
would I stop being one?

The first rule of measurement is "Don't bother with a measurement if you
don't know what you're going to do with the answer".  If you knew the
number of "R users", however defined, how would that actually help you?

Why do bad things to make a measurement that's ill-defined, arguably
impossible to measure meaningfully, and not that much use when you have it?




More information about the R-help mailing list