[Rd] Milestone: 5000 packages on CRAN
Spencer Graves
spencer.graves at prodsyse.com
Sat Nov 9 03:13:49 CET 2013
CRAN size has grown almost exponentially at least since 2001. R
history was discussed by John Fox (2009) Aspects of the Social
Organization and Trajectory of the R Project, R Journal
(http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf).
Below please find his data plus 5 additional points I added and R script
I used to fit a models.
I won't defend the models fit in the script below. However,
unless CRAN management changes dramatically in the next 5 years, it
seems likely that CRAN will have 10,000 packages some time in 2018.
By the way, if you don't already use the sos package routinely, I
encourage you to consider it. For me, it's by far the fastest
literature search for anything statistical. In a very few minutes, I
get an Excel file with a summary by package of the matches to almost any
combination of search terms. (Shameless plug by the lead author of the
package ;-)
Best Wishes,
Spencer Graves
date packages
2001-06-21 110
2001-12-17 129
2002-06-12 162
2003-05-27 219
2003-11-16 273
2004-06-05 357
2004-10-12 406
2005-06-18 548
2005-12-16 647
2006-05-31 739
2006-12-12 911
2007-04-12 1000
2007-11-16 1300
2008-03-18 1427
2008-10-18 1614
2009-09-17 1952
2012-06-12 3786
2012-11-01 4082
2012-12-14 4210
2013-10-28 4960
2013-11-08 5000
library(gdata)
(CRANfile <- dir(pattern='s\\.xls$'))
#readLines(CRANfile)
str(CRANhist. <- read.xls(CRANfile, stringsAsFactors=FALSE,
header=TRUE))
tail(CRANhist., 11)
CRANhist <- CRANhist.[1:20, 1:2]
(dt. <- as.Date(CRANhist$date))
CRANhist$date <- dt.
(day1 <- min(CRANhist$date)) # 2001-06-21
str(ddate <- CRANhist$date-day1)
# difftime in days
CRANhist$CRANdays <- as.numeric(ddate)
(growth <- lm(log(packages)~CRANdays, CRANhist))
CRANhist$pred <- exp(predict(growth))
plot(packages~date, CRANhist, log='y')
lines(pred~date, CRANhist, pch='.')
fitLogLogis <- nls(log(packages) ~ a+b*CRANdays + log(1+exp(d+b*CRANdays)),
CRANhist, start=c(a=4.9, b=0.0009, d=0))
# Error ... singular gradient
library(drc)
CRANlogLogis <- drm(packages~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis, log='y') # very poor through 2005
CRANlogLogis. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis., log='y') # terrible: far worse than CRANlogLogis
CRANlogLogis4 <- drm(packages~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4, log='y') # poor for 2001 but great otherwise
CRANlogLogis4. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.
abline(h=c(4200, 8400))
sapply(CRANhist, range)
pred.dTimes <- seq(0, 6000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
data.frame(Date=as.Date(day1+pred.dTimes), nPkgs=exp(CRANpred))
plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)
pred.dTimes <- seq(0, 10000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)
abline(h=c(4200, 8400))
abline(v=as.Date('2012-12-14'))
abline(v=as.Date('2017-09-30'))
#########################
abline(h=20000)
abline(h=70000)
pred.dTimes <- seq(0, 1000000, 10000)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)
On 11/8/2013 4:43 PM, William Dunlap wrote:
>> "Currently, the CRAN package repository features 5001 available packages."
>>
>> Going from 4000 to 5000 packages took 14.5 months - that's one new package
>> every 10.5 hours. Behind every package there are real people. These
>> user-contributed packages are maintained by ~2900 people [2] - that's 350
>> new maintainers and many more contributors. More people to thank than ever
>> before - don't forget about them, e.g. cite properly when publishing.
> Congratulations!
>
> I have often wondered about the natural history of R packages: how often they
> are created and shared, how long they are used, how many people use them,
> how long they are maintained, etc. The usage numbers are hard to get, but the
> "Last modified" dates in the CRAN archives do give some information on how
> often new packages are shared and how long they are maintained.
>
> Here are some summaries of derived from those dates. The code to get the
> data and calculate (and plot) the summaries follows.
>
>> newPkgsByYear
> 1997-01-01 1998-01-01 1999-01-01 2000-01-01
> 2 12 56 41
> 2001-01-01 2002-01-01 2003-01-01 2004-01-01
> 65 66 101 144
> 2005-01-01 2006-01-01 2007-01-01 2008-01-01
> 209 280 329 374
> 2009-01-01 2010-01-01 2011-01-01 2012-01-01
> 502 546 702 809
> 2013-01-01
> 439
>> table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
> nUpdatesSinceSep2011
> 0 1 2 3 4 5 6 7 8 9
> 2079 963 528 332 238 166 75 79 50 43
> 10 11 12 13 14 15 16 17 18 19
> 23 22 13 14 8 9 12 5 4 1
> 20 21 22 24 26 27 31 32 34
> 1 3 1 2 1 2 1 1 1
>
> The code I used is:
>
> library(XML)
> getArchiveList <- function(site = "http://cran.r-project.org/src/contrib/Archive/") {
> retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]]
> retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ]
> retval$Name <- gsub("/$", "", retval$Name)
> retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
> retval
> }
> getArchiveEntry <- function(Name, site = "http://cran.r-project.org/src/contrib/Archive/") {
> retval <- readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]]
> retval <- retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ]
> retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
> retval
> }
>
> al <- getArchiveList()
> # The next may bog down the CRAN archive server - do not do it often
> # ae <- lapply(structure(al$Name, names=al$Name),
> # function(Name)tryCatch(getArchiveEntry(Name),
> # error=function(e)data.frame(Name=character(), "Last Modified" = as.Date(character()))))
>
> initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
> lastSubmissionDate <- as.Date(vapply(ae, function(e)max(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
>
> mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months")
> yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years")
>
> par(ask=TRUE)
>
> newPkgsByMonth <- table(cut(initialSubmissionDate, mths))
> newPkgsByYear <- table(cut(initialSubmissionDate, yrs))
> plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by month") # number of additions each month
>
> yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - initialSubmissionDate, units="days")/365.25
> hist(yearsOfMaintainanceActivity, xlab="Years", main="Maintainance Duration")
>
> newPkgsByYear
> table(floor(yearsOfMaintainanceActivity))
>
> nUpdatesSinceSep2011 <- vapply(ae, function(e){
> Lm <- e[["Last modified"]]
> sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L)
> table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
>> Of Henrik Bengtsson
>> Sent: Friday, November 08, 2013 1:59 PM
>> To: R Development Mailing List
>> Subject: [Rd] Milestone: 5000 packages on CRAN
>>
>> Here we go again...
>>
>> Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]:
>>
>> "Currently, the CRAN package repository features 5001 available packages."
>>
>> Going from 4000 to 5000 packages took 14.5 months - that's one new package
>> every 10.5 hours. Behind every package there are real people. These
>> user-contributed packages are maintained by ~2900 people [2] - that's 350
>> new maintainers and many more contributors. More people to thank than ever
>> before - don't forget about them, e.g. cite properly when publishing.
>>
>> Milestones:
>>
>> 2013-11-08: 5000 packages [this post]
>> 2012-08-23: 4000 packages [7]
>> 2011-05-12: 3000 packages [6]
>> 2009-10-04: 2000 packages [5]
>> 2007-04-12: 1000 packages [4]
>> 2004-10-01: 500 packages [3,4]
>> 2003-04-01: 250 packages [3,4]
>>
>> [1] http://cran.r-project.org/web/packages/
>> [2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html
>> [3] Private data.
>> [4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
>> [5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
>> [6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
>> [7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
>>
>> /Henrik
>>
>> PS. These data are for CRAN only. There are more packages elsewhere, e.g.
>> R-Forge, Bioconductor, Github etc.
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph: 408-655-4567
web: www.structuremonitoring.com
More information about the R-devel
mailing list