[R] R in the NY Times
Marc Schwartz
marc_schwartz at comcast.net
Wed Jan 7 21:07:51 CET 2009
on 01/07/2009 09:29 AM Max Kuhn wrote:
>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."
>
> I think that I actually made this statement about both the SAS and
> Splus traffic...
>
> I wasn't really trying to be critical of SAS. I was trying to get
> across that SAS focused their resources on features that had nothing
> to do with *statistical analysis* (e.g. data warehousing etc.)
Presuming that the Google Groups archive of SAS-L is reasonably complete:
http://groups.google.com/group/comp.soft-sys.sas/about
The monthly posting frequency data since 1993 is:
Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
)), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
row.names = c("1993",
"1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
))
> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1993 NA NA NA NA NA NA 15 458 330 219 472 517
1994 546 511 658 681 712 751 763 975 703 805 752 666
1995 548 734 963 792 945 1002 775 969 745 691 773 765
1996 853 1024 805 1052 1163 999 1184 1053 1176 1197 911 844
1997 1007 1150 1108 1315 1212 1127 1074 692 947 900 853 677
1998 894 1068 945 784 448 813 896 823 894 1129 733 492
1999 514 493 659 1077 778 540 476 612 1351 1708 1720 1595
2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
2005 2122 1960 629 2169 2283 2407 2061 1793 1365 1427 1518 1524
2006 2722 1645 1711 2796 3147 2723 761 2027 2714 2983 2848 2374
2007 2750 926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
2009 357 NA NA NA NA NA NA NA NA NA NA NA
One can then review the annual posting frequency via:
pdf("SAS-L.pdf", height = 4, width = 7)
mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "SAS-L Traffic",
cex.axis = 0.75, las = 1)
mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)
dev.off()
There would appear to be marked increases in 2000 and again in 2006.
However, it has been flat for the past 3 calendar years. No decline yet,
but it will happen in due course...
No comparable posting data table exists for S-News as far as I can find,
so I wrote a quick program to read the S-News archive pages here:
http://www.biostat.wustl.edu/archives/html/s-news/
and get monthly posting counts, using the 'Thread' based html pages,
where each monthly embedded post link has a URL of the form:
http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html
Thus, the program I used is:
TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
Posts <- numeric(length(TD))
for (i in seq(along = TD))
{
URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
TD[i], "/threads.html", sep = "")
cat(URL, "\n")
if (!inherits(try(con <- readLines(URL)), "try-error"))
{
Posts[i] <- length(grep("msg.*\\.html", con))
rm(con)
} else {
Posts[i] <- NA
}
}
Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1998:2009
colnames(Posts) <- month.abb
That gives you:
Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))
> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1998 NA 273 378 293 330 243 219 209 191 241 181 141
1999 210 173 313 300 334 254 284 270 300 253 300 194
2000 264 313 285 264 306 247 245 302 204 251 261 176
2001 246 232 252 300 331 282 258 260 260 229 232 194
2002 230 255 242 228 219 248 230 207 221 280 228 177
2003 189 179 218 196 189 217 221 187 186 295 197 142
2004 197 230 257 151 164 175 154 187 195 150 176 176
2005 174 161 193 182 174 109 159 144 107 98 82 84
2006 109 87 99 123 107 96 84 97 68 73 53 20
2007 51 59 74 48 46 34 47 39 35 70 56 41
2008 48 63 58 47 31 27 40 28 41 30 27 36
2009 5 NA NA NA NA NA NA NA NA NA NA NA
Which can then be graphed by:
pdf("S-News.pdf", height = 4, width = 7)
mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "S-News Traffic",
cex.axis = 0.75, las = 1)
mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)
dev.off()
The consistent decline in posting frequency since 1999 is notable. The
temporal association with the introduction of R is perhaps profound.
As long as I am on the subject, I figured that I would do the same for
R-Help. The downside is that readLines() (really url() ) does not
support https:, so I took a somewhat different approach, using wget:
TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
Posts <- numeric(length(TD))
for (i in seq(along = TD))
{
URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
TD[i], "/thread.html", sep = "")
cat(URL, "\n")
CMD <- paste("wget", URL)
system(CMD)
if (file.exists("thread.html"))
{
con <- readLines("thread.html")
Posts[i] <- length(grep("[0-9]+\\.html", con))
rm(con)
unlink("thread.html")
} else {
Posts[i] <- NA
}
}
Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1997:2009
colnames(Posts) <- month.abb
This gives you:
Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1997 NA NA NA 92 36 47 41 37 40 76 61 57
1998 135 79 114 101 90 105 110 64 94 96 184 105
1999 226 145 195 189 161 186 184 148 203 231 318 221
2000 205 355 377 377 504 418 293 356 434 418 433 422
2001 558 583 651 470 552 550 615 562 678 657 825 530
2002 884 697 880 965 1057 926 918 824 705 1055 1038 742
2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
2009 462 NA NA NA NA NA NA NA NA NA NA NA
Which again can be graphed as:
pdf("R-Help.pdf", height = 4, width = 7)
mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "R-Help Traffic",
cex.axis = 0.75, las = 1)
mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)
dev.off()
Now....there's a healthy growth curve.... :-)
Note that the annual traffic volume for 2008 on R-Help exceeds that on
SAS-L.
For convenience, I am attaching each of the 3 plots.
Regards,
Marc Schwartz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SAS-L.pdf
Type: application/pdf
Size: 4795 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090107/5fd0ece6/attachment-0006.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: S-News.pdf
Type: application/pdf
Size: 4098 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090107/5fd0ece6/attachment-0007.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: R-Help.pdf
Type: application/pdf
Size: 4263 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090107/5fd0ece6/attachment-0008.pdf>
More information about the R-help
mailing list