Which CRAN mirror is the fastest ?

Martin Maechler maechler at stat.math.ethz.ch
Thu Jul 30 11:49:54 CEST 2009

>>>>> Barry Rowlingson <b.rowlingson at lancaster.ac.uk>
>>>>>     on Thu, 30 Jul 2009 09:59:47 +0100 writes:

    > 2009/7/30 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
    >> Hard to lee, you have to try out, I fear.
    >> The speed you see highly depends on the connection from your country to
    >> others, but of course, there are also some mirrors that are not the fastest
    >> themselves.

    > I figured you could write a function that got the CRAN mirror list and
    > tested their response. Here's my 'cranometer':

    > cranometer <- function(ms = getCRANmirrors(all = FALSE, local.only = FALSE)){

    > dest = tempfile()

    > nms = dim(ms)[1]
    > ms$t = rep(NA,nms)
    > for(i in 1:nms){
    > m = ms[i,]
    > url = paste(m$URL,"/src/base/NEWS",sep="")
    > t = try(system.time(download.file(url,dest),gcFirst=TRUE))
    > if(file.exists(dest)){
    > file.remove(dest)
    > ms$t[i]=t['elapsed']
    > }else{
    > ms$t[i]=NA
    > }
    > }
    > return(ms)
    > }

    > It works by downloading the latest NEWS file (376Kbytes at the
    > moment, so not huge) from each of the mirror sites in the CRAN mirrors
    > list. If you want to test it on a subset then call getCRANmirrors
    > yourself and subset it somehow.

    > I'm running it now on the full CRAN list and I've yet to find a
    > timeout or error so I'm not sure what will happen if download.file
    > fails. It retuns a data frame like you get from getCRANmirrors but
    > with an extra 't' column giving the elapsed time to get the NEWS file.

    > CAVEATS: if your network has any local caching then these results
    > will be wrong, since your computer will probably be getting the
    > locally cached NEWS file and not the one on the server. Especially if
    > you run it twice. Oh, I should have put cacheOK=FALSE in the
    > download.file - but even that might get overruled somewhere. Also,
    > sites may have good days and bad days, good minutes and bad minutes,
    > your network may be congested on a short-term basis, etc etc.

    > Other ideas: how about combining the CRAN list with my geonames
    > package to work out distances from where you are to the CRAN site? I
    > might write that later if I get a minute...

Yes!  And visualize the corresponding  "nearest neigbourhood"
for each CRAN mirror on a world map
and make this dynamically refreshing every few minutes 
and put it on a webserver so people can watch the "CRAN world"
in real time!  

More seriously, it would be really cool if a "robust" version of
cranometer() could be used automagically in the (typical /
default) case of install.packages() {and it's call from the
Windows (or also Mac?) 'Packages' menu} when the user / site
have no CRAN repository specified:
It would choose the CRAN mirror which is closest,
or even better (and more appropriate for a statistics software),
would chose one at random, but with probability inversely
proportional to (a power of ?) the "distance".

... yes, we should defer this  from R-help to  R-devel ..


