[R] downloaf.file

Barry Rowlingson B.Rowlingson at lancaster.ac.uk
Tue Feb 4 17:26:09 CET 2003


>>to download a file from the net, the function download.file(..)
>>does the job.  However, before embarking on the download, I would
>>like to find out how large the file is.  Is there a way to know it?
>>

  You can send web servers a 'HEAD' request, which can give you some 
basic information about the download. I cant see a way to get this from 
the current R functions, so here's a little routine to leverage the 
'lynx' web browser:


"head.download" <-
   function (url)
{
   if (system("lynx -help > /dev/null") == 0) {
     method <- "lynx"
   }
   else {
     stop("No lynx found")
   }
   if (method == "lynx") {
     heads <- system(paste("lynx -head -dump '", url,"'", sep = 
""),intern=T)
   }

# turn name: value lines into named list. prob vectorisable

   ret <- list(status=heads[1])
   for(l in 2:length(heads)){
     col <- regexpr(":",heads[l])
     if(col>-1){
       name <- substr(heads[l],1,(col-1))
       value <- substr(heads[l],(col+1),nchar(heads[l]))
       ret[[name]] <- value
     }else{
       ret <- c(ret,heads[l])
     }
   }
   ret
}

  this borrows bits from download.file(), but it does depend on you 
having lynx installed. The return value is a list with names 
corresponding to the header titles and values being the values. It looks 
for a : as the title: value separator, and anything that doesnt have a : 
is just added verbatim unnamed.

  For example, how big is the R logo on the home page?

 > head.download("http://www.r-project.org/Rlogo.jpg")$"Content-Length"
[1] " 8793"

  That's bytes. Yes I know its character! I dont think web servers are 
under any obligation to provide accurate Content-length values. Many 
dynamic web servers have pages that change length every time. This will 
also not for for ftp:// URLs or local file:// URLs (or gopher:// URLs?).

  Perhaps HEAD-getting functionality can be put in the next release of 
R? It would probably have a better "name: value -> named list" routine 
than the one I just hacked up in two minutes above. Oops. Shame.

Baz




More information about the R-help mailing list