[R] R in the NY Times

Stas Kolenikov skolenik at gmail.com
Thu Jan 8 17:42:07 CET 2009


On 1/7/09, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Here is the same number of messages/posts data
>  for each of S, SAS, R:
>  - reworked into a 3 column ts class time series
>  - with Jan 2009 removed since its not complete
>  - leading and trailing NA rows removed

My software of choice is Stata, so here are compatible data from
statalist (using
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/):

## Statalist traffic
stata <- structure(c(
654,574,781, 848, 714, 823,1063,1057,
701,625,909, 799, 941,1052,1013,1269,
868,690,937,1155,1040,1113,1125,1252,
640,649,899, 898,1013,1161, 991,1325,
622,697,726,1102, 818,1077,1111,1374,
684,548,651, 876, 964, 963,1125,1078,
717,588,943, 923, 885, 892, 986,1200,
728,575,605, 901,1010,1011,1224,1396,
627,605,712, 807,1098, 951, 939,1446,
844,790,970, 940,1001,1283,1231,1509,
776,644,870, 928,1094, 928, 999,1340,
603,512,670, 824, 794, 951, 739,1056
),
.Dim = c(8L, 12L),
.Dimnames = list(c("2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))

The list existed from 1994 or 1996 or so, but the data are only
available from 2001. You'd probably be surprised to find out that
based on the list summaries, the size of Stata world is about half of
SAS on the counts plot; and on the log scale, it shows linear (which
means, exponential) growth throughout the range, while both SAS and R
have been slowing down in the last couple of years (with an
explanation already offered regarding the r-sig-* lists).

Of course overall that's an incorrect comparison, to begin with. The
support systems for all three packages are different: most (US)
universities will have dedicated and well-certified SAS gurus
answering most semicolon questions locally, while r-help would be the
first thing on my mind if I cannot get what I need in the docs. I
would thus expect traffic on r-help will to be heavier relative to the
user base.

Another measure of interest might be the number of contributed
packages. The phrase for R is this: "Currently, the CRAN package
repository features 1633 objects including 1625 packages and 8 bundles
containing 34 packages, for a total of 1659 available packages." The
phrase for Stata is this: "Statistical Software Components,
Boston College Department of Economics: There are currently 1275 items
in this series, of which 1274 are downloadable"
(http://logec.repec.org/scripts/seriesstat.pl?item=repec:boc:bocode).
So programming activity in Stata is about 3/4 of that in R at their
face values (you would probably need to downplay both numbers for
obsolete packages, though). Whether SAS has a unified repository of
user contributed modules with direct counts available, I have no clue.

A really good measure for R will be the total # of the downloads of
r-base for all platforms from all CRAN mirrors (and I would expect
that # can be found from the servers' logs). Given that it is so easy
to download everything nice and clean and up to date, I would doubt
anybody will be distributing CD-ROMs with R install files among
friends and colleagues. SAS (and Stata, and SPSS, and Minitab, and...)
should have their (internal) number of licenses sold (and yes those
come on the disks initially), but those are badly blurred by the
network licenses, and are commercial secrets, anyway.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.




More information about the R-help mailing list