[R] R in the NY Times
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Jan 8 01:52:17 CET 2009
I did try the log version as well prior to posting but although
it would seem to exaggerate the difference to me the insights
from plotting the raw data with loess (i.e. constancy of the total, piecewise
constant growth of R) come through best.
On Wed, Jan 7, 2009 at 6:53 PM, Spencer Graves
<spencer.graves at prodsyse.com> wrote:
> Thanks, Gabor, Marc, Max:
> The image is even more striking (and more accurately reflects reality, I
> believe) if you add "log='y'" to "ts.plot".
> Best Wishes,
> Spencer
>
> Gabor Grothendieck wrote:
>>
>> Here is the same number of messages/posts data
>> for each of S, SAS, R:
>> - reworked into a 3 column ts class time series
>> - with Jan 2009 removed since its not complete
>> - leading and trailing NA rows removed
>>
>> At end we plot the raw data as well as the time
>> series of totals and show loess smooths for each.
>>
>> By running the code below we see that the:
>> - sum of the three seems to be rising at a constant rate
>> - S is declining
>> - SAS and R are rising
>> - R is rising the fastest through its completed its phase
>> of highest growth which ended around 2004
>>
>> tt3 <- structure(c(15, 458, 330, 219, 472, 517, 546, 511, 658, 681,
>> 712, 751, 763, 975, 703, 805, 752, 666, 548, 734, 963, 792, 945,
>> 1002, 775, 969, 745, 691, 773, 765, 853, 1024, 805, 1052, 1163,
>> 999, 1184, 1053, 1176, 1197, 911, 844, 1007, 1150, 1108, 1315,
>> 1212, 1127, 1074, 692, 947, 900, 853, 677, 894, 1068, 945, 784,
>> 448, 813, 896, 823, 894, 1129, 733, 492, 514, 493, 659, 1077,
>> 778, 540, 476, 612, 1351, 1708, 1720, 1595, 1720, 1519, 1177,
>> 1163, 1963, 1615, 1572, 1696, 1491, 1669, 1490, 1298, 1826, 1537,
>> 1915, 1467, 1735, 1905, 2027, 1976, 1439, 1592, 1636, 1424, 1941,
>> 1845, 2010, 2199, 2373, 2133, 2445, 1492, 1864, 2133, 1663, 1520,
>> 1832, 1846, 1755, 1757, 1863, 1701, 1926, 1689, 1646, 1832, 1545,
>> 1445, 1636, 1652, 2188, 1826, 1836, 2606, 1843, 2143, 1784, 1712,
>> 1786, 2148, 2122, 1960, 629, 2169, 2283, 2407, 2061, 1793, 1365,
>> 1427, 1518, 1524, 2722, 1645, 1711, 2796, 3147, 2723, 761, 2027,
>> 2714, 2983, 2848, 2374, 2750, 926, 1728, 2766, 2974, 2691, 2435,
>> 2592, 1868, 2320, 2112, 1948, 2305, 2255, 2712, 2789, 2025, 2368,
>> 2607, 2584, 2554, 2434, 1984, 1921, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> 273, 378, 293, 330, 243, 219, 209, 191, 241, 181, 141, 210, 173,
>> 313, 300, 334, 254, 284, 270, 300, 253, 300, 194, 264, 313, 285,
>> 264, 306, 247, 245, 302, 204, 251, 261, 176, 246, 232, 252, 300,
>> 331, 282, 258, 260, 260, 229, 232, 194, 230, 255, 242, 228, 219,
>> 248, 230, 207, 221, 280, 228, 177, 189, 179, 218, 196, 189, 217,
>> 221, 187, 186, 295, 197, 142, 197, 230, 257, 151, 164, 175, 154,
>> 187, 195, 150, 176, 176, 174, 161, 193, 182, 174, 109, 159, 144,
>> 107, 98, 82, 84, 109, 87, 99, 123, 107, 96, 84, 97, 68, 73, 53,
>> 20, 51, 59, 74, 48, 46, 34, 47, 39, 35, 70, 56, 41, 48, 63, 58,
>> 47, 31, 27, 40, 28, 41, 30, 27, 36, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, 92, 36, 47, 41, 37, 40, 76, 61, 57, 135,
>> 79, 114, 101, 90, 105, 110, 64, 94, 96, 184, 105, 226, 145, 195,
>> 189, 161, 186, 184, 148, 203, 231, 318, 221, 205, 355, 377, 377,
>> 504, 418, 293, 356, 434, 418, 433, 422, 558, 583, 651, 470, 552,
>> 550, 615, 562, 678, 657, 825, 530, 884, 697, 880, 965, 1057,
>> 926, 918, 824, 705, 1055, 1038, 742, 1017, 1137, 1203, 1488,
>> 1268, 1319, 1344, 1210, 1443, 1567, 1605, 1158, 1116, 1580, 1946,
>> 1657, 1561, 1714, 1618, 1493, 1534, 1712, 1895, 1481, 1746, 1724,
>> 1703, 2057, 1887, 2056, 1872, 1777, 1709, 1810, 1907, 1508, 2075,
>> 1920, 2270, 1818, 2029, 1811, 1785, 1898, 1902, 2328, 2127, 1450,
>> 1714, 1907, 2191, 2145, 2210, 2307, 2138, 2241, 2028, 2708, 2594,
>> 2028, 2490, 2583, 2740, 2487, 2517, 2774, 3268, 2813, 2990, 3037,
>> 2730, 2399), .Dim = c(186L, 3L), .Dimnames = list(NULL, c("SAS",
>> "S", "R")), .Tsp = c(1993.5, 2008.91666666667, 12), class = c("mts",
>> "ts"))
>>
>> tt4 <- cbind(tt3, rowSums(tt3))
>> colnames(tt4) <- c(colnames(tt3), "Sum")
>> ts.plot(tt4, col = 1:4)
>> grid()
>> legend("topleft", colnames(tt4), lty = 1, col = 1:4)
>>
>> library(dyn)
>> for(i in 1:4) lines(fitted(dyn$loess(tt4[, i] ~ time(tt4))), col = i)
>>
>>
>> On Wed, Jan 7, 2009 at 3:07 PM, Marc Schwartz <marc_schwartz at comcast.net>
>> wrote:
>>
>>>
>>> on 01/07/2009 09:29 AM Max Kuhn wrote:
>>>
>>>>>
>>>>> "You can look on the SAS message boards and see there is a proportional
>>>>> downturn in traffic."
>>>>>
>>>>
>>>> I think that I actually made this statement about both the SAS and
>>>> Splus traffic...
>>>>
>>>> I wasn't really trying to be critical of SAS. I was trying to get
>>>> across that SAS focused their resources on features that had nothing
>>>> to do with *statistical analysis* (e.g. data warehousing etc.)
>>>>
>>>
>>> Presuming that the Google Groups archive of SAS-L is reasonably complete:
>>>
>>> http://groups.google.com/group/comp.soft-sys.sas/about
>>>
>>> The monthly posting frequency data since 1993 is:
>>>
>>> Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
>>> 1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
>>> 357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
>>> 1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
>>> 658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
>>> 2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
>>> 1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
>>> 2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
>>> 1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
>>> 3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
>>> 813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
>>> 2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
>>> 476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
>>> 2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
>>> 1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
>>> NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
>>> 1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
>>> Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
>>> 1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
>>> ), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
>>> 1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
>>> ), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
>>> 1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
>>> )), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
>>> "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
>>> row.names = c("1993",
>>> "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
>>> "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
>>> ))
>>>
>>>
>>>
>>>
>>>>
>>>> Posts
>>>>
>>>
>>> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>>> 1993 NA NA NA NA NA NA 15 458 330 219 472 517
>>> 1994 546 511 658 681 712 751 763 975 703 805 752 666
>>> 1995 548 734 963 792 945 1002 775 969 745 691 773 765
>>> 1996 853 1024 805 1052 1163 999 1184 1053 1176 1197 911 844
>>> 1997 1007 1150 1108 1315 1212 1127 1074 692 947 900 853 677
>>> 1998 894 1068 945 784 448 813 896 823 894 1129 733 492
>>> 1999 514 493 659 1077 778 540 476 612 1351 1708 1720 1595
>>> 2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
>>> 2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
>>> 2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
>>> 2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
>>> 2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
>>> 2005 2122 1960 629 2169 2283 2407 2061 1793 1365 1427 1518 1524
>>> 2006 2722 1645 1711 2796 3147 2723 761 2027 2714 2983 2848 2374
>>> 2007 2750 926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
>>> 2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
>>> 2009 357 NA NA NA NA NA NA NA NA NA NA NA
>>>
>>>
>>> One can then review the annual posting frequency via:
>>>
>>> pdf("SAS-L.pdf", height = 4, width = 7)
>>>
>>> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>>> beside = TRUE,
>>> cex.names = 0.6, main = "SAS-L Traffic",
>>> cex.axis = 0.75, las = 1)
>>>
>>> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>>> line = 2, cex = 0.5)
>>>
>>> dev.off()
>>>
>>>
>>> There would appear to be marked increases in 2000 and again in 2006.
>>> However, it has been flat for the past 3 calendar years. No decline yet,
>>> but it will happen in due course...
>>>
>>>
>>>
>>> No comparable posting data table exists for S-News as far as I can find,
>>> so I wrote a quick program to read the S-News archive pages here:
>>>
>>> http://www.biostat.wustl.edu/archives/html/s-news/
>>>
>>> and get monthly posting counts, using the 'Thread' based html pages,
>>> where each monthly embedded post link has a URL of the form:
>>>
>>> http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html
>>>
>>>
>>> Thus, the program I used is:
>>>
>>> TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
>>> Posts <- numeric(length(TD))
>>>
>>> for (i in seq(along = TD))
>>> {
>>> URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
>>> TD[i], "/threads.html", sep = "")
>>>
>>> cat(URL, "\n")
>>>
>>> if (!inherits(try(con <- readLines(URL)), "try-error"))
>>> {
>>> Posts[i] <- length(grep("msg.*\\.html", con))
>>> rm(con)
>>> } else {
>>> Posts[i] <- NA
>>> }
>>> }
>>>
>>>
>>> Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
>>> rownames(Posts) <- 1998:2009
>>> colnames(Posts) <- month.abb
>>>
>>> That gives you:
>>>
>>> Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
>>> 5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
>>> 313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
>>> 264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
>>> 331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
>>> 248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
>>> 221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
>>> 187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
>>> 107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
>>> 70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
>>> NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
>>> 12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
>>> "2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
>>> "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
>>> "Nov", "Dec")))
>>>
>>>
>>>
>>>>
>>>> Posts
>>>>
>>>
>>> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>>> 1998 NA 273 378 293 330 243 219 209 191 241 181 141
>>> 1999 210 173 313 300 334 254 284 270 300 253 300 194
>>> 2000 264 313 285 264 306 247 245 302 204 251 261 176
>>> 2001 246 232 252 300 331 282 258 260 260 229 232 194
>>> 2002 230 255 242 228 219 248 230 207 221 280 228 177
>>> 2003 189 179 218 196 189 217 221 187 186 295 197 142
>>> 2004 197 230 257 151 164 175 154 187 195 150 176 176
>>> 2005 174 161 193 182 174 109 159 144 107 98 82 84
>>> 2006 109 87 99 123 107 96 84 97 68 73 53 20
>>> 2007 51 59 74 48 46 34 47 39 35 70 56 41
>>> 2008 48 63 58 47 31 27 40 28 41 30 27 36
>>> 2009 5 NA NA NA NA NA NA NA NA NA NA NA
>>>
>>>
>>> Which can then be graphed by:
>>>
>>> pdf("S-News.pdf", height = 4, width = 7)
>>>
>>> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>>> beside = TRUE,
>>> cex.names = 0.6, main = "S-News Traffic",
>>> cex.axis = 0.75, las = 1)
>>>
>>> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>>> line = 2, cex = 0.5)
>>>
>>> dev.off()
>>>
>>>
>>>
>>> The consistent decline in posting frequency since 1999 is notable. The
>>> temporal association with the introduction of R is perhaps profound.
>>>
>>>
>>>
>>> As long as I am on the subject, I figured that I would do the same for
>>> R-Help. The downside is that readLines() (really url() ) does not
>>> support https:, so I took a somewhat different approach, using wget:
>>>
>>>
>>> TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
>>> Posts <- numeric(length(TD))
>>>
>>> for (i in seq(along = TD))
>>> {
>>> URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
>>> TD[i], "/thread.html", sep = "")
>>>
>>> cat(URL, "\n")
>>>
>>> CMD <- paste("wget", URL)
>>> system(CMD)
>>>
>>> if (file.exists("thread.html"))
>>> {
>>> con <- readLines("thread.html")
>>> Posts[i] <- length(grep("[0-9]+\\.html", con))
>>> rm(con)
>>> unlink("thread.html")
>>> } else {
>>> Posts[i] <- NA
>>> }
>>> }
>>>
>>> Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
>>> rownames(Posts) <- 1997:2009
>>> colnames(Posts) <- month.abb
>>>
>>>
>>> This gives you:
>>>
>>> Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
>>> 2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
>>> 1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
>>> 1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
>>> 1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
>>> 1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
>>> 926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
>>> 615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
>>> 356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
>>> 203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
>>> 96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
>>> NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
>>> 2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
>>> 2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
>>> "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
>>> "2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
>>> "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
>>>
>>>
>>>
>>>>
>>>> Posts
>>>>
>>>
>>> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>>> 1997 NA NA NA 92 36 47 41 37 40 76 61 57
>>> 1998 135 79 114 101 90 105 110 64 94 96 184 105
>>> 1999 226 145 195 189 161 186 184 148 203 231 318 221
>>> 2000 205 355 377 377 504 418 293 356 434 418 433 422
>>> 2001 558 583 651 470 552 550 615 562 678 657 825 530
>>> 2002 884 697 880 965 1057 926 918 824 705 1055 1038 742
>>> 2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
>>> 2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
>>> 2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
>>> 2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
>>> 2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
>>> 2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
>>> 2009 462 NA NA NA NA NA NA NA NA NA NA NA
>>>
>>>
>>> Which again can be graphed as:
>>>
>>> pdf("R-Help.pdf", height = 4, width = 7)
>>>
>>> mp <- barplot(rowSums(Posts, na.rm = TRUE),
>>> beside = TRUE,
>>> cex.names = 0.6, main = "R-Help Traffic",
>>> cex.axis = 0.75, las = 1)
>>>
>>> mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
>>> line = 2, cex = 0.5)
>>>
>>> dev.off()
>>>
>>>
>>> Now....there's a healthy growth curve.... :-)
>>>
>>> Note that the annual traffic volume for 2008 on R-Help exceeds that on
>>> SAS-L.
>>>
>>> For convenience, I am attaching each of the 3 plots.
>>>
>>> Regards,
>>>
>>> Marc Schwartz
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
More information about the R-help
mailing list