[R] downloaf.file
Barry Rowlingson
B.Rowlingson at lancaster.ac.uk
Tue Feb 4 17:26:09 CET 2003
>>to download a file from the net, the function download.file(..)
>>does the job. However, before embarking on the download, I would
>>like to find out how large the file is. Is there a way to know it?
>>
You can send web servers a 'HEAD' request, which can give you some
basic information about the download. I cant see a way to get this from
the current R functions, so here's a little routine to leverage the
'lynx' web browser:
"head.download" <-
function (url)
{
if (system("lynx -help > /dev/null") == 0) {
method <- "lynx"
}
else {
stop("No lynx found")
}
if (method == "lynx") {
heads <- system(paste("lynx -head -dump '", url,"'", sep =
""),intern=T)
}
# turn name: value lines into named list. prob vectorisable
ret <- list(status=heads[1])
for(l in 2:length(heads)){
col <- regexpr(":",heads[l])
if(col>-1){
name <- substr(heads[l],1,(col-1))
value <- substr(heads[l],(col+1),nchar(heads[l]))
ret[[name]] <- value
}else{
ret <- c(ret,heads[l])
}
}
ret
}
this borrows bits from download.file(), but it does depend on you
having lynx installed. The return value is a list with names
corresponding to the header titles and values being the values. It looks
for a : as the title: value separator, and anything that doesnt have a :
is just added verbatim unnamed.
For example, how big is the R logo on the home page?
> head.download("http://www.r-project.org/Rlogo.jpg")$"Content-Length"
[1] " 8793"
That's bytes. Yes I know its character! I dont think web servers are
under any obligation to provide accurate Content-length values. Many
dynamic web servers have pages that change length every time. This will
also not for for ftp:// URLs or local file:// URLs (or gopher:// URLs?).
Perhaps HEAD-getting functionality can be put in the next release of
R? It would probably have a better "name: value -> named list" routine
than the one I just hacked up in two minutes above. Oops. Shame.
Baz
More information about the R-help
mailing list