[R] Growth of CRAN?
Spencer Graves
spencer.graves at structuremonitoring.com
Mon Apr 14 06:20:47 CEST 2014
On 4/13/2014 7:41 PM, Gabor Grothendieck wrote:
> On Sun, Apr 13, 2014 at 1:26 PM, John Fox <jfox at mcmaster.ca> wrote:
>> I've attached the most recent data I have, which are from mid-2012. My
>> package counts came from
>> https://svn.r-project.org/R/branches/R-*-branch/tests/internet.Rout.save
>> (where the * is the R version).
>>
>
> It seems that the growth is exponential but at a lower slope (of the
> log curve) after 2008 than before. A linear fit to the log curve is
> shown in blue before 2008 and in red after 2008. What happened to
> result in two such distinct regimes?
I got a great fit using a 4-parameter log-logistic model with
drm{drc}; see below. This model suggests that CRAN will approach an
asymptote of roughly 60,000 packages with a 95% confidence interval
ranging from 31 to 117 thousand.
Obviously, the confidence interval for the asymptote assumes the
4-parameter log-logistic model is accurate. That's probably not
realistic but is more accurate than assuming continued exponential
growth. If I had time to develop more accurate predictions and
confidence intervals, I'd try Bayesian Model Averaging with several
different models.
Thanks for the question and comments.
Spencer
# Wait until "Build status: Current" at rev. 178 on Ecfun on R-Forge, then:
install.packages("Ecfun", repos="http://R-Forge.R-project.org")
(day1 <- min(CRANpackages$Date)) # 2001-06-21
str(ddate <- CRANpackages$Date-day1)
CRANpackages$CRANdays <- as.numeric(ddate)
library(drc)
CRANlogLogis4. <- drm(log(Packages)~CRANdays, data=CRANpackages, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.
plot(resid(CRANlogLogis4.))
CRANlogLogis4.
# log(Packages) = c + (d-c)/(1 + (t/t0)^b)
# where
# b = -1.36 = log(60152)
# c = 4.73
# d = 11.0
# t0 = 3309 days since 2001-06-21
(ci4 <- confint(CRANlogLogis4.))
2.5% 97.5%
b -1.49 -1.24 # power of time = rate at which t^b -> 0
c 4.67 4.80 #
d 10.34 11.67 # asymptote of log(Packages)
t0 2800 3818 # reference number of days
# Asymptotic number of CRAN packages
exp(ci4[3, ])
2.5 % 97.5 %
c(31, 117)*1000
>
> Lines <- "version date packages
> 1.3 2001-06-21 110
> 1.4 2001-12-17 129
> 1.5 2002-05-29 162
> #1.6 2002-10-01 163
> 1.7 2003-05-27 219
> 1.8 2003-11-16 273
> 1.9 2004-06-05 357
> 2.0 2004-10-12 406
> 2.1 2005-06-18 548
> 2.2 2005-12-16 647
> 2.3 2006-05-31 739
> 2.4 2006-12-12 911
> 2.5 2007-04-12 1000
> 2.6 2007-11-16 1300
> 2.7 2008-03-18 1427
> 2.8 2008-10-18 1614 # updated
> 2.9 2009-04-17 1952
> 2.10 2009-10-26 2088
> 2.11 2010-04-22 2445
> 2.12 2010-10-15 2837
> 2.13 2011-04-13 3286
> 2.14 2011-06-20 3618
> 2.15 2012-07-07 4000
> "
> library(zoo)
> zz <- read.zoo(text = Lines, header = TRUE, index = 2)[, 2]
> plot(log(zz))
> d <- as.Date("2008-01-01")
> abline(v = d)
> pre <- time(zz) < d
> fo <- log(zz) ~ time(zz)
> abline(lm(fo, subset = pre), col = "blue")
> abline(lm(fo, subset = !pre), col = "red")
More information about the R-help
mailing list