[R] R in the NY Times
Spencer Graves
spencer.graves at prodsyse.com
Thu Jan 8 00:53:03 CET 2009
Thanks, Gabor, Marc, Max:
The image is even more striking (and more accurately reflects
reality, I believe) if you add "log='y'" to "ts.plot".
Best Wishes,
Spencer
Gabor Grothendieck wrote:
> Here is the same number of messages/posts data
> for each of S, SAS, R:
> - reworked into a 3 column ts class time series
> - with Jan 2009 removed since its not complete
> - leading and trailing NA rows removed
>
> At end we plot the raw data as well as the time
> series of totals and show loess smooths for each.
>
> By running the code below we see that the:
> - sum of the three seems to be rising at a constant rate
> - S is declining
> - SAS and R are rising
> - R is rising the fastest through its completed its phase
> of highest growth which ended around 2004
>
> tt3 <- structure(c(15, 458, 330, 219, 472, 517, 546, 511, 658, 681,
> 712, 751, 763, 975, 703, 805, 752, 666, 548, 734, 963, 792, 945,
> 1002, 775, 969, 745, 691, 773, 765, 853, 1024, 805, 1052, 1163,
> 999, 1184, 1053, 1176, 1197, 911, 844, 1007, 1150, 1108, 1315,
> 1212, 1127, 1074, 692, 947, 900, 853, 677, 894, 1068, 945, 784,
> 448, 813, 896, 823, 894, 1129, 733, 492, 514, 493, 659, 1077,
> 778, 540, 476, 612, 1351, 1708, 1720, 1595, 1720, 1519, 1177,
> 1163, 1963, 1615, 1572, 1696, 1491, 1669, 1490, 1298, 1826, 1537,
> 1915, 1467, 1735, 1905, 2027, 1976, 1439, 1592, 1636, 1424, 1941,
> 1845, 2010, 2199, 2373, 2133, 2445, 1492, 1864, 2133, 1663, 1520,
> 1832, 1846, 1755, 1757, 1863, 1701, 1926, 1689, 1646, 1832, 1545,
> 1445, 1636, 1652, 2188, 1826, 1836, 2606, 1843, 2143, 1784, 1712,
> 1786, 2148, 2122, 1960, 629, 2169, 2283, 2407, 2061, 1793, 1365,
> 1427, 1518, 1524, 2722, 1645, 1711, 2796, 3147, 2723, 761, 2027,
> 2714, 2983, 2848, 2374, 2750, 926, 1728, 2766, 2974, 2691, 2435,
> 2592, 1868, 2320, 2112, 1948, 2305, 2255, 2712, 2789, 2025, 2368,
> 2607, 2584, 2554, 2434, 1984, 1921, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> 273, 378, 293, 330, 243, 219, 209, 191, 241, 181, 141, 210, 173,
> 313, 300, 334, 254, 284, 270, 300, 253, 300, 194, 264, 313, 285,
> 264, 306, 247, 245, 302, 204, 251, 261, 176, 246, 232, 252, 300,
> 331, 282, 258, 260, 260, 229, 232, 194, 230, 255, 242, 228, 219,
> 248, 230, 207, 221, 280, 228, 177, 189, 179, 218, 196, 189, 217,
> 221, 187, 186, 295, 197, 142, 197, 230, 257, 151, 164, 175, 154,
> 187, 195, 150, 176, 176, 174, 161, 193, 182, 174, 109, 159, 144,
> 107, 98, 82, 84, 109, 87, 99, 123, 107, 96, 84, 97, 68, 73, 53,
> 20, 51, 59, 74, 48, 46, 34, 47, 39, 35, 70, 56, 41, 48, 63, 58,
> 47, 31, 27, 40, 28, 41, 30, 27, 36, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, 92, 36, 47, 41, 37, 40, 76, 61, 57, 135,
> 79, 114, 101, 90, 105, 110, 64, 94, 96, 184, 105, 226, 145, 195,
> 189, 161, 186, 184, 148, 203, 231, 318, 221, 205, 355, 377, 377,
> 504, 418, 293, 356, 434, 418, 433, 422, 558, 583, 651, 470, 552,
> 550, 615, 562, 678, 657, 825, 530, 884, 697, 880, 965, 1057,
> 926, 918, 824, 705, 1055, 1038, 742, 1017, 1137, 1203, 1488,
> 1268, 1319, 1344, 1210, 1443, 1567, 1605, 1158, 1116, 1580, 1946,
> 1657, 1561, 1714, 1618, 1493, 1534, 1712, 1895, 1481, 1746, 1724,
> 1703, 2057, 1887, 2056, 1872, 1777, 1709, 1810, 1907, 1508, 2075,
> 1920, 2270, 1818, 2029, 1811, 1785, 1898, 1902, 2328, 2127, 1450,
> 1714, 1907, 2191, 2145, 2210, 2307, 2138, 2241, 2028, 2708, 2594,
> 2028, 2490, 2583, 2740, 2487, 2517, 2774, 3268, 2813, 2990, 3037,
> 2730, 2399), .Dim = c(186L, 3L), .Dimnames = list(NULL, c("SAS",
> "S", "R")), .Tsp = c(1993.5, 2008.91666666667, 12), class = c("mts",
> "ts"))
>
> tt4 <- cbind(tt3, rowSums(tt3))
> colnames(tt4) <- c(colnames(tt3), "Sum")
> ts.plot(tt4, col = 1:4)
> grid()
> legend("topleft", colnames(tt4), lty = 1, col = 1:4)
>
> library(dyn)
> for(i in 1:4) lines(fitted(dyn$loess(tt4[, i] ~ time(tt4))), col = i)
>
>
> On Wed, Jan 7, 2009 at 3:07 PM, Marc Schwartz <marc_schwartz at comcast.net> wrote:
>
>> on 01/07/2009 09:29 AM Max Kuhn wrote:
>>
>>>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."
>>>>
>>> I think that I actually made this statement about both the SAS and
>>> Splus traffic...
>>>
>>> I wasn't really trying to be critical of SAS. I was trying to get
>>> across that SAS focused their resources on features that had nothing
>>> to do with *statistical analysis* (e.g. data warehousing etc.)
>>>
>> Presuming that the Google Groups archive of SAS-L is reasonably complete:
>>
>> http://groups.google.com/group/comp.soft-sys.sas/about
>>
>> The monthly posting frequency data since 1993 is:
>>
>> Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
>> 1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
>> 357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
>> 1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
>> 658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
>> 2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
>> 1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
>> 2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
>> 1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
>> 3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
>> 813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
>> 2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
>> 476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
>> 2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
>> 1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
>> NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
>> 1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
>> Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
>> 1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
>> ), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
>> 1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
>> ), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
>> 1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
>> )), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
>> "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
>> row.names = c("1993",
>> "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
>> "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
>> ))
>>
>>
>>
>>
>>> Posts
>>>
>> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>> 1993 NA NA NA NA NA NA 15 458 330 219 472 517
>> 1994 546 511 658 681 712 751 763 975 703 805 752 666
>> 1995 548 734 963 792 945 1002 775 969 745 691 773 765
>> 1996 853 1024 805 1052 1163 999 1184 1053 1176 1197 911 844
>> 1997 1007 1150 1108 1315 1212 1127 1074 692 947 900 853 677
>> 1998 894 1068 945 784 448 813 896 823 894 1129 733 492
>> 1999 514 493 659 1077 778 540 476 612 1351 1708 1720 1595
>> 2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
>> 2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
>> 2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
>> 2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
>> 2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
>> 2005 2122 1960 629 2169 2283 2407 2061 1793 1365 1427 1518 1524
>> 2006 2722 1645 1711 2796 3147 2723 761 2027 2714 2983 2848 2374
>> 2007 2750 926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
>> 2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
>> 2009 357 NA NA NA NA NA NA NA NA NA NA NA
>>
>>
>> One can then review the annual posting frequency via:
>>
>> pdf("SAS-L.pdf", height = 4, width = 7)
>>
>> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>> beside = TRUE,
>> cex.names = 0.6, main = "SAS-L Traffic",
>> cex.axis = 0.75, las = 1)
>>
>> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>> line = 2, cex = 0.5)
>>
>> dev.off()
>>
>>
>> There would appear to be marked increases in 2000 and again in 2006.
>> However, it has been flat for the past 3 calendar years. No decline yet,
>> but it will happen in due course...
>>
>>
>>
>> No comparable posting data table exists for S-News as far as I can find,
>> so I wrote a quick program to read the S-News archive pages here:
>>
>> http://www.biostat.wustl.edu/archives/html/s-news/
>>
>> and get monthly posting counts, using the 'Thread' based html pages,
>> where each monthly embedded post link has a URL of the form:
>>
>> http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html
>>
>>
>> Thus, the program I used is:
>>
>> TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
>> Posts <- numeric(length(TD))
>>
>> for (i in seq(along = TD))
>> {
>> URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
>> TD[i], "/threads.html", sep = "")
>>
>> cat(URL, "\n")
>>
>> if (!inherits(try(con <- readLines(URL)), "try-error"))
>> {
>> Posts[i] <- length(grep("msg.*\\.html", con))
>> rm(con)
>> } else {
>> Posts[i] <- NA
>> }
>> }
>>
>>
>> Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
>> rownames(Posts) <- 1998:2009
>> colnames(Posts) <- month.abb
>>
>> That gives you:
>>
>> Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
>> 5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
>> 313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
>> 264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
>> 331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
>> 248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
>> 221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
>> 187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
>> 107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
>> 70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
>> NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
>> 12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
>> "2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
>> "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
>> "Nov", "Dec")))
>>
>>
>>
>>> Posts
>>>
>> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>> 1998 NA 273 378 293 330 243 219 209 191 241 181 141
>> 1999 210 173 313 300 334 254 284 270 300 253 300 194
>> 2000 264 313 285 264 306 247 245 302 204 251 261 176
>> 2001 246 232 252 300 331 282 258 260 260 229 232 194
>> 2002 230 255 242 228 219 248 230 207 221 280 228 177
>> 2003 189 179 218 196 189 217 221 187 186 295 197 142
>> 2004 197 230 257 151 164 175 154 187 195 150 176 176
>> 2005 174 161 193 182 174 109 159 144 107 98 82 84
>> 2006 109 87 99 123 107 96 84 97 68 73 53 20
>> 2007 51 59 74 48 46 34 47 39 35 70 56 41
>> 2008 48 63 58 47 31 27 40 28 41 30 27 36
>> 2009 5 NA NA NA NA NA NA NA NA NA NA NA
>>
>>
>> Which can then be graphed by:
>>
>> pdf("S-News.pdf", height = 4, width = 7)
>>
>> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>> beside = TRUE,
>> cex.names = 0.6, main = "S-News Traffic",
>> cex.axis = 0.75, las = 1)
>>
>> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>> line = 2, cex = 0.5)
>>
>> dev.off()
>>
>>
>>
>> The consistent decline in posting frequency since 1999 is notable. The
>> temporal association with the introduction of R is perhaps profound.
>>
>>
>>
>> As long as I am on the subject, I figured that I would do the same for
>> R-Help. The downside is that readLines() (really url() ) does not
>> support https:, so I took a somewhat different approach, using wget:
>>
>>
>> TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
>> Posts <- numeric(length(TD))
>>
>> for (i in seq(along = TD))
>> {
>> URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
>> TD[i], "/thread.html", sep = "")
>>
>> cat(URL, "\n")
>>
>> CMD <- paste("wget", URL)
>> system(CMD)
>>
>> if (file.exists("thread.html"))
>> {
>> con <- readLines("thread.html")
>> Posts[i] <- length(grep("[0-9]+\\.html", con))
>> rm(con)
>> unlink("thread.html")
>> } else {
>> Posts[i] <- NA
>> }
>> }
>>
>> Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
>> rownames(Posts) <- 1997:2009
>> colnames(Posts) <- month.abb
>>
>>
>> This gives you:
>>
>> Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
>> 2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
>> 1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
>> 1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
>> 1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
>> 1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
>> 926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
>> 615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
>> 356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
>> 203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
>> 96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
>> NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
>> 2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
>> 2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
>> "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
>> "2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
>> "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
>>
>>
>>
>>> Posts
>>>
>> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>> 1997 NA NA NA 92 36 47 41 37 40 76 61 57
>> 1998 135 79 114 101 90 105 110 64 94 96 184 105
>> 1999 226 145 195 189 161 186 184 148 203 231 318 221
>> 2000 205 355 377 377 504 418 293 356 434 418 433 422
>> 2001 558 583 651 470 552 550 615 562 678 657 825 530
>> 2002 884 697 880 965 1057 926 918 824 705 1055 1038 742
>> 2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
>> 2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
>> 2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
>> 2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
>> 2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
>> 2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
>> 2009 462 NA NA NA NA NA NA NA NA NA NA NA
>>
>>
>> Which again can be graphed as:
>>
>> pdf("R-Help.pdf", height = 4, width = 7)
>>
>> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>> beside = TRUE,
>> cex.names = 0.6, main = "R-Help Traffic",
>> cex.axis = 0.75, las = 1)
>>
>> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>> line = 2, cex = 0.5)
>>
>> dev.off()
>>
>>
>> Now....there's a healthy growth curve.... :-)
>>
>> Note that the annual traffic volume for 2008 on R-Help exceeds that on
>> SAS-L.
>>
>> For convenience, I am attaching each of the 3 plots.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list