[Rd] Milestone: 5000 packages on CRAN

William Dunlap wdunlap at tibco.com
Sat Nov 9 01:43:54 CET 2013


> "Currently, the CRAN package repository features 5001 available packages."
> 
> Going from 4000 to 5000 packages took 14.5 months - that's one new package
> every 10.5 hours. Behind every package there are real people. These
> user-contributed packages are maintained by ~2900 people [2] - that's 350
> new maintainers and many more contributors. More people to thank than ever
> before - don't forget about them, e.g. cite properly when publishing.

Congratulations!

I have often wondered about the natural history of R packages: how often they
are created and shared, how long they are used, how many people use them,
how long they are maintained, etc.  The usage numbers are hard to get, but the
"Last modified" dates in the CRAN archives do give some information on how
often new packages are shared and how long they are maintained.

Here are some summaries of derived from those dates.  The code to get the
data and calculate (and plot) the summaries follows.

> newPkgsByYear

1997-01-01 1998-01-01 1999-01-01 2000-01-01 
         2         12         56         41 
2001-01-01 2002-01-01 2003-01-01 2004-01-01 
        65         66        101        144 
2005-01-01 2006-01-01 2007-01-01 2008-01-01 
       209        280        329        374 
2009-01-01 2010-01-01 2011-01-01 2012-01-01 
       502        546        702        809 
2013-01-01 
       439
> table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
nUpdatesSinceSep2011
   0    1    2    3    4    5    6    7    8    9 
2079  963  528  332  238  166   75   79   50   43 
  10   11   12   13   14   15   16   17   18   19 
  23   22   13   14    8    9   12    5    4    1 
  20   21   22   24   26   27   31   32   34 
   1    3    1    2    1    2    1    1    1

The code I used is:

library(XML)
getArchiveList <- function(site = "http://cran.r-project.org/src/contrib/Archive/") {
    retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]]
    retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ]
    retval$Name <- gsub("/$", "", retval$Name)
    retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
    retval
}
getArchiveEntry <- function(Name, site = "http://cran.r-project.org/src/contrib/Archive/") {
    retval <- readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]]
    retval <- retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ]
    retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
    retval
}

al <- getArchiveList()
# The next may bog down the CRAN archive server - do not do it often
# ae <- lapply(structure(al$Name, names=al$Name),
#              function(Name)tryCatch(getArchiveEntry(Name),
#                                     error=function(e)data.frame(Name=character(), "Last Modified" = as.Date(character()))))

initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
lastSubmissionDate <- as.Date(vapply(ae, function(e)max(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))

mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months")
yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years")

par(ask=TRUE)

newPkgsByMonth <-  table(cut(initialSubmissionDate, mths))
newPkgsByYear <-  table(cut(initialSubmissionDate, yrs))
plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by month") # number of additions each month

yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - initialSubmissionDate, units="days")/365.25
hist(yearsOfMaintainanceActivity, xlab="Years", main="Maintainance Duration")

newPkgsByYear
table(floor(yearsOfMaintainanceActivity))

nUpdatesSinceSep2011 <- vapply(ae, function(e){
    Lm <- e[["Last modified"]]
    sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L)
table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
> Of Henrik Bengtsson
> Sent: Friday, November 08, 2013 1:59 PM
> To: R Development Mailing List
> Subject: [Rd] Milestone: 5000 packages on CRAN
> 
> Here we go again...
> 
> Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]:
> 
> "Currently, the CRAN package repository features 5001 available packages."
> 
> Going from 4000 to 5000 packages took 14.5 months - that's one new package
> every 10.5 hours. Behind every package there are real people. These
> user-contributed packages are maintained by ~2900 people [2] - that's 350
> new maintainers and many more contributors. More people to thank than ever
> before - don't forget about them, e.g. cite properly when publishing.
> 
> Milestones:
> 
> 2013-11-08: 5000 packages [this post]
> 2012-08-23: 4000 packages [7]
> 2011-05-12: 3000 packages [6]
> 2009-10-04: 2000 packages [5]
> 2007-04-12: 1000 packages [4]
> 2004-10-01: 500 packages [3,4]
> 2003-04-01: 250 packages [3,4]
> 
> [1] http://cran.r-project.org/web/packages/
> [2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html
> [3] Private data.
> [4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
> [5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
> [6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
> [7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
> 
> /Henrik
> 
> PS. These data are for CRAN only. There are more packages elsewhere, e.g.
> R-Forge, Bioconductor, Github etc.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list