[R] R in the NY Times

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jan 8 00:24:57 CET 2009


Here is the same number of messages/posts data
for each of S, SAS, R:
- reworked into a 3 column ts class time series
- with Jan 2009 removed since its not complete
- leading and trailing NA rows removed

At end we plot the raw data as well as the time
series of totals and show loess smooths for each.

By running the code below we see that the:
- sum of the three seems to be rising at a constant rate
- S is declining
- SAS and R are rising
- R is rising the fastest through its completed its phase
of highest growth which ended around 2004

tt3 <- structure(c(15, 458, 330, 219, 472, 517, 546, 511, 658, 681,
712, 751, 763, 975, 703, 805, 752, 666, 548, 734, 963, 792, 945,
1002, 775, 969, 745, 691, 773, 765, 853, 1024, 805, 1052, 1163,
999, 1184, 1053, 1176, 1197, 911, 844, 1007, 1150, 1108, 1315,
1212, 1127, 1074, 692, 947, 900, 853, 677, 894, 1068, 945, 784,
448, 813, 896, 823, 894, 1129, 733, 492, 514, 493, 659, 1077,
778, 540, 476, 612, 1351, 1708, 1720, 1595, 1720, 1519, 1177,
1163, 1963, 1615, 1572, 1696, 1491, 1669, 1490, 1298, 1826, 1537,
1915, 1467, 1735, 1905, 2027, 1976, 1439, 1592, 1636, 1424, 1941,
1845, 2010, 2199, 2373, 2133, 2445, 1492, 1864, 2133, 1663, 1520,
1832, 1846, 1755, 1757, 1863, 1701, 1926, 1689, 1646, 1832, 1545,
1445, 1636, 1652, 2188, 1826, 1836, 2606, 1843, 2143, 1784, 1712,
1786, 2148, 2122, 1960, 629, 2169, 2283, 2407, 2061, 1793, 1365,
1427, 1518, 1524, 2722, 1645, 1711, 2796, 3147, 2723, 761, 2027,
2714, 2983, 2848, 2374, 2750, 926, 1728, 2766, 2974, 2691, 2435,
2592, 1868, 2320, 2112, 1948, 2305, 2255, 2712, 2789, 2025, 2368,
2607, 2584, 2554, 2434, 1984, 1921, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
273, 378, 293, 330, 243, 219, 209, 191, 241, 181, 141, 210, 173,
313, 300, 334, 254, 284, 270, 300, 253, 300, 194, 264, 313, 285,
264, 306, 247, 245, 302, 204, 251, 261, 176, 246, 232, 252, 300,
331, 282, 258, 260, 260, 229, 232, 194, 230, 255, 242, 228, 219,
248, 230, 207, 221, 280, 228, 177, 189, 179, 218, 196, 189, 217,
221, 187, 186, 295, 197, 142, 197, 230, 257, 151, 164, 175, 154,
187, 195, 150, 176, 176, 174, 161, 193, 182, 174, 109, 159, 144,
107, 98, 82, 84, 109, 87, 99, 123, 107, 96, 84, 97, 68, 73, 53,
20, 51, 59, 74, 48, 46, 34, 47, 39, 35, 70, 56, 41, 48, 63, 58,
47, 31, 27, 40, 28, 41, 30, 27, 36, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 92, 36, 47, 41, 37, 40, 76, 61, 57, 135,
79, 114, 101, 90, 105, 110, 64, 94, 96, 184, 105, 226, 145, 195,
189, 161, 186, 184, 148, 203, 231, 318, 221, 205, 355, 377, 377,
504, 418, 293, 356, 434, 418, 433, 422, 558, 583, 651, 470, 552,
550, 615, 562, 678, 657, 825, 530, 884, 697, 880, 965, 1057,
926, 918, 824, 705, 1055, 1038, 742, 1017, 1137, 1203, 1488,
1268, 1319, 1344, 1210, 1443, 1567, 1605, 1158, 1116, 1580, 1946,
1657, 1561, 1714, 1618, 1493, 1534, 1712, 1895, 1481, 1746, 1724,
1703, 2057, 1887, 2056, 1872, 1777, 1709, 1810, 1907, 1508, 2075,
1920, 2270, 1818, 2029, 1811, 1785, 1898, 1902, 2328, 2127, 1450,
1714, 1907, 2191, 2145, 2210, 2307, 2138, 2241, 2028, 2708, 2594,
2028, 2490, 2583, 2740, 2487, 2517, 2774, 3268, 2813, 2990, 3037,
2730, 2399), .Dim = c(186L, 3L), .Dimnames = list(NULL, c("SAS",
"S", "R")), .Tsp = c(1993.5, 2008.91666666667, 12), class = c("mts",
"ts"))

tt4 <- cbind(tt3, rowSums(tt3))
colnames(tt4) <- c(colnames(tt3), "Sum")
ts.plot(tt4, col = 1:4)
grid()
legend("topleft", colnames(tt4), lty = 1, col = 1:4)

library(dyn)
for(i in 1:4) lines(fitted(dyn$loess(tt4[, i] ~ time(tt4))), col = i)


On Wed, Jan 7, 2009 at 3:07 PM, Marc Schwartz <marc_schwartz at comcast.net> wrote:
> on 01/07/2009 09:29 AM Max Kuhn wrote:
>>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."
>>
>> I think that I actually made this statement about both the SAS and
>> Splus traffic...
>>
>> I wasn't really trying to be critical of SAS. I was trying to get
>> across that SAS focused their resources on features that had nothing
>> to do with *statistical analysis* (e.g. data warehousing etc.)
>
>
> Presuming that the Google Groups archive of SAS-L is reasonably complete:
>
>  http://groups.google.com/group/comp.soft-sys.sas/about
>
> The monthly posting frequency data since 1993 is:
>
> Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
> 1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
> 357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
> 1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
> 658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
> 2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
> 1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
> 2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
> 1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
> 3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
> 813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
> 2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
> 476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
> 2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
> 1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
> NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
> 1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
>    Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
>    1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
>    ), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
>    1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
>    ), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
>    1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
>    )), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
> "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
> row.names = c("1993",
> "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
> "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
> ))
>
>
>
>> Posts
>      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
> 1993   NA   NA   NA   NA   NA   NA   15  458  330  219  472  517
> 1994  546  511  658  681  712  751  763  975  703  805  752  666
> 1995  548  734  963  792  945 1002  775  969  745  691  773  765
> 1996  853 1024  805 1052 1163  999 1184 1053 1176 1197  911  844
> 1997 1007 1150 1108 1315 1212 1127 1074  692  947  900  853  677
> 1998  894 1068  945  784  448  813  896  823  894 1129  733  492
> 1999  514  493  659 1077  778  540  476  612 1351 1708 1720 1595
> 2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
> 2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
> 2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
> 2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
> 2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
> 2005 2122 1960  629 2169 2283 2407 2061 1793 1365 1427 1518 1524
> 2006 2722 1645 1711 2796 3147 2723  761 2027 2714 2983 2848 2374
> 2007 2750  926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
> 2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
> 2009  357   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
>
>
> One can then review the annual posting frequency via:
>
> pdf("SAS-L.pdf", height = 4, width = 7)
>
> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>              beside = TRUE,
>              cex.names = 0.6, main = "SAS-L Traffic",
>              cex.axis = 0.75, las = 1)
>
> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>      line = 2, cex = 0.5)
>
> dev.off()
>
>
> There would appear to be marked increases in 2000 and again in 2006.
> However, it has been flat for the past 3 calendar years. No decline yet,
> but it will happen in due course...
>
>
>
> No comparable posting data table exists for S-News as far as I can find,
> so I wrote a quick program to read the S-News archive pages here:
>
>  http://www.biostat.wustl.edu/archives/html/s-news/
>
> and get monthly posting counts, using the 'Thread' based html pages,
> where each monthly embedded post link has a URL of the form:
>
> http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html
>
>
> Thus, the program I used is:
>
> TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
> Posts <- numeric(length(TD))
>
> for (i in seq(along = TD))
> {
>  URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
>               TD[i], "/threads.html", sep = "")
>
>  cat(URL, "\n")
>
>  if (!inherits(try(con <- readLines(URL)), "try-error"))
>  {
>    Posts[i] <- length(grep("msg.*\\.html", con))
>    rm(con)
>  } else {
>    Posts[i] <- NA
>  }
> }
>
>
> Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
> rownames(Posts) <- 1998:2009
> colnames(Posts) <- month.abb
>
> That gives you:
>
> Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
> 5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
> 313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
> 264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
> 331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
> 248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
> 221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
> 187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
> 107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
> 70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
> NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
> 12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
> "2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
> "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
> "Nov", "Dec")))
>
>
>> Posts
>     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 1998  NA 273 378 293 330 243 219 209 191 241 181 141
> 1999 210 173 313 300 334 254 284 270 300 253 300 194
> 2000 264 313 285 264 306 247 245 302 204 251 261 176
> 2001 246 232 252 300 331 282 258 260 260 229 232 194
> 2002 230 255 242 228 219 248 230 207 221 280 228 177
> 2003 189 179 218 196 189 217 221 187 186 295 197 142
> 2004 197 230 257 151 164 175 154 187 195 150 176 176
> 2005 174 161 193 182 174 109 159 144 107  98  82  84
> 2006 109  87  99 123 107  96  84  97  68  73  53  20
> 2007  51  59  74  48  46  34  47  39  35  70  56  41
> 2008  48  63  58  47  31  27  40  28  41  30  27  36
> 2009   5  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>
>
> Which can then be graphed by:
>
> pdf("S-News.pdf", height = 4, width = 7)
>
> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>              beside = TRUE,
>              cex.names = 0.6, main = "S-News Traffic",
>              cex.axis = 0.75, las = 1)
>
> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>      line = 2, cex = 0.5)
>
> dev.off()
>
>
>
> The consistent decline in posting frequency since 1999 is notable. The
> temporal association with the introduction of R is perhaps profound.
>
>
>
> As long as I am on the subject, I figured that I would do the same for
> R-Help. The downside is that readLines() (really url() ) does not
> support https:, so I took a somewhat different approach, using wget:
>
>
> TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
> Posts <- numeric(length(TD))
>
> for (i in seq(along = TD))
> {
>  URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
>               TD[i], "/thread.html", sep = "")
>
>  cat(URL, "\n")
>
>  CMD <- paste("wget", URL)
>  system(CMD)
>
>  if (file.exists("thread.html"))
>  {
>    con <- readLines("thread.html")
>    Posts[i] <- length(grep("[0-9]+\\.html", con))
>    rm(con)
>    unlink("thread.html")
>  } else {
>    Posts[i] <- NA
>  }
> }
>
> Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
> rownames(Posts) <- 1997:2009
> colnames(Posts) <- month.abb
>
>
> This gives you:
>
> Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
> 2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
> 1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
> 1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
> 1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
> 1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
> 926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
> 615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
> 356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
> 203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
> 96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
> NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
> 2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
> 2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
> "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
> "2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
> "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
>
>
>> Posts
>      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
> 1997   NA   NA   NA   92   36   47   41   37   40   76   61   57
> 1998  135   79  114  101   90  105  110   64   94   96  184  105
> 1999  226  145  195  189  161  186  184  148  203  231  318  221
> 2000  205  355  377  377  504  418  293  356  434  418  433  422
> 2001  558  583  651  470  552  550  615  562  678  657  825  530
> 2002  884  697  880  965 1057  926  918  824  705 1055 1038  742
> 2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
> 2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
> 2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
> 2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
> 2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
> 2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
> 2009  462   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
>
>
> Which again can be graphed as:
>
> pdf("R-Help.pdf", height = 4, width = 7)
>
> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>              beside = TRUE,
>              cex.names = 0.6, main = "R-Help Traffic",
>              cex.axis = 0.75, las = 1)
>
> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>      line = 2, cex = 0.5)
>
> dev.off()
>
>
> Now....there's a healthy growth curve....  :-)
>
> Note that the annual traffic volume for 2008 on R-Help exceeds that on
> SAS-L.
>
> For convenience, I am attaching each of the 3 plots.
>
> Regards,
>
> Marc Schwartz
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list