[Rd] Milestone: 5000 packages on CRAN

Spencer Graves spencer.graves at prodsyse.com
Sat Nov 9 03:13:49 CET 2013


       CRAN size has grown almost exponentially at least since 2001.  R 
history was discussed by John Fox (2009) Aspects of the Social 
Organization and Trajectory of the R Project, R Journal 
(http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf). 
Below please find his data plus 5 additional points I added and R script 
I used to fit a models.


       I won't defend the models fit in the script below.  However, 
unless CRAN management changes dramatically in the next 5 years, it 
seems likely that CRAN will have 10,000 packages some time in 2018.


       By the way, if you don't already use the sos package routinely, I 
encourage you to consider it.  For me, it's by far the fastest 
literature search for anything statistical.  In a very few minutes, I 
get an Excel file with a summary by package of the matches to almost any 
combination of search terms.  (Shameless plug by the lead author of the 
package ;-)


       Best Wishes,
       Spencer Graves


date    packages
2001-06-21    110
2001-12-17    129
2002-06-12    162
2003-05-27    219
2003-11-16    273
2004-06-05    357
2004-10-12    406
2005-06-18    548
2005-12-16    647
2006-05-31    739
2006-12-12    911
2007-04-12    1000
2007-11-16    1300
2008-03-18    1427
2008-10-18    1614
2009-09-17    1952
2012-06-12    3786
2012-11-01    4082
2012-12-14    4210
2013-10-28    4960
2013-11-08    5000

library(gdata)

(CRANfile <- dir(pattern='s\\.xls$'))
#readLines(CRANfile)
str(CRANhist. <- read.xls(CRANfile, stringsAsFactors=FALSE,
                            header=TRUE))
tail(CRANhist., 11)
CRANhist <- CRANhist.[1:20, 1:2]

(dt. <- as.Date(CRANhist$date))
CRANhist$date <- dt.

(day1 <- min(CRANhist$date)) # 2001-06-21
str(ddate <- CRANhist$date-day1)
# difftime in days

CRANhist$CRANdays <- as.numeric(ddate)
(growth <- lm(log(packages)~CRANdays, CRANhist))

CRANhist$pred <- exp(predict(growth))
plot(packages~date, CRANhist, log='y')
lines(pred~date, CRANhist, pch='.')

fitLogLogis <- nls(log(packages) ~ a+b*CRANdays + log(1+exp(d+b*CRANdays)),
                    CRANhist, start=c(a=4.9, b=0.0009, d=0))
# Error ... singular gradient

library(drc)
CRANlogLogis <- drm(packages~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis, log='y') # very poor through 2005

CRANlogLogis. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis., log='y') # terrible:  far worse than CRANlogLogis

CRANlogLogis4 <- drm(packages~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4, log='y') # poor for 2001 but great otherwise

CRANlogLogis4. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.
abline(h=c(4200, 8400))

sapply(CRANhist, range)
pred.dTimes <- seq(0, 6000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
data.frame(Date=as.Date(day1+pred.dTimes), nPkgs=exp(CRANpred))

plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)

pred.dTimes <- seq(0, 10000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))

plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)
abline(h=c(4200, 8400))
abline(v=as.Date('2012-12-14'))
abline(v=as.Date('2017-09-30'))

#########################

abline(h=20000)
abline(h=70000)

pred.dTimes <- seq(0, 1000000, 10000)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)


On 11/8/2013 4:43 PM, William Dunlap wrote:
>> "Currently, the CRAN package repository features 5001 available packages."
>>
>> Going from 4000 to 5000 packages took 14.5 months - that's one new package
>> every 10.5 hours. Behind every package there are real people. These
>> user-contributed packages are maintained by ~2900 people [2] - that's 350
>> new maintainers and many more contributors. More people to thank than ever
>> before - don't forget about them, e.g. cite properly when publishing.
> Congratulations!
>
> I have often wondered about the natural history of R packages: how often they
> are created and shared, how long they are used, how many people use them,
> how long they are maintained, etc.  The usage numbers are hard to get, but the
> "Last modified" dates in the CRAN archives do give some information on how
> often new packages are shared and how long they are maintained.
>
> Here are some summaries of derived from those dates.  The code to get the
> data and calculate (and plot) the summaries follows.
>
>> newPkgsByYear
> 1997-01-01 1998-01-01 1999-01-01 2000-01-01
>           2         12         56         41
> 2001-01-01 2002-01-01 2003-01-01 2004-01-01
>          65         66        101        144
> 2005-01-01 2006-01-01 2007-01-01 2008-01-01
>         209        280        329        374
> 2009-01-01 2010-01-01 2011-01-01 2012-01-01
>         502        546        702        809
> 2013-01-01
>         439
>> table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
> nUpdatesSinceSep2011
>     0    1    2    3    4    5    6    7    8    9
> 2079  963  528  332  238  166   75   79   50   43
>    10   11   12   13   14   15   16   17   18   19
>    23   22   13   14    8    9   12    5    4    1
>    20   21   22   24   26   27   31   32   34
>     1    3    1    2    1    2    1    1    1
>
> The code I used is:
>
> library(XML)
> getArchiveList <- function(site = "http://cran.r-project.org/src/contrib/Archive/") {
>      retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]]
>      retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ]
>      retval$Name <- gsub("/$", "", retval$Name)
>      retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
>      retval
> }
> getArchiveEntry <- function(Name, site = "http://cran.r-project.org/src/contrib/Archive/") {
>      retval <- readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]]
>      retval <- retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ]
>      retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
>      retval
> }
>
> al <- getArchiveList()
> # The next may bog down the CRAN archive server - do not do it often
> # ae <- lapply(structure(al$Name, names=al$Name),
> #              function(Name)tryCatch(getArchiveEntry(Name),
> #                                     error=function(e)data.frame(Name=character(), "Last Modified" = as.Date(character()))))
>
> initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
> lastSubmissionDate <- as.Date(vapply(ae, function(e)max(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
>
> mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months")
> yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years")
>
> par(ask=TRUE)
>
> newPkgsByMonth <-  table(cut(initialSubmissionDate, mths))
> newPkgsByYear <-  table(cut(initialSubmissionDate, yrs))
> plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by month") # number of additions each month
>
> yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - initialSubmissionDate, units="days")/365.25
> hist(yearsOfMaintainanceActivity, xlab="Years", main="Maintainance Duration")
>
> newPkgsByYear
> table(floor(yearsOfMaintainanceActivity))
>
> nUpdatesSinceSep2011 <- vapply(ae, function(e){
>      Lm <- e[["Last modified"]]
>      sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L)
> table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
>> Of Henrik Bengtsson
>> Sent: Friday, November 08, 2013 1:59 PM
>> To: R Development Mailing List
>> Subject: [Rd] Milestone: 5000 packages on CRAN
>>
>> Here we go again...
>>
>> Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]:
>>
>> "Currently, the CRAN package repository features 5001 available packages."
>>
>> Going from 4000 to 5000 packages took 14.5 months - that's one new package
>> every 10.5 hours. Behind every package there are real people. These
>> user-contributed packages are maintained by ~2900 people [2] - that's 350
>> new maintainers and many more contributors. More people to thank than ever
>> before - don't forget about them, e.g. cite properly when publishing.
>>
>> Milestones:
>>
>> 2013-11-08: 5000 packages [this post]
>> 2012-08-23: 4000 packages [7]
>> 2011-05-12: 3000 packages [6]
>> 2009-10-04: 2000 packages [5]
>> 2007-04-12: 1000 packages [4]
>> 2004-10-01: 500 packages [3,4]
>> 2003-04-01: 250 packages [3,4]
>>
>> [1] http://cran.r-project.org/web/packages/
>> [2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html
>> [3] Private data.
>> [4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
>> [5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
>> [6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
>> [7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
>>
>> /Henrik
>>
>> PS. These data are for CRAN only. There are more packages elsewhere, e.g.
>> R-Forge, Bioconductor, Github etc.
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com



More information about the R-devel mailing list