[R] help with restart

Wincent ronggui.huang at gmail.com
Wed May 5 10:34:30 CEST 2010


Dear all, I want to download webpage from a large number of webpage.
For example,

########
link <- c("http://gzbbs.soufun.com/board/2811006802/",
"http://gzbbs.soufun.com/board/2811328226/",
"http://gzbbs.soufun.com/board/2811720258/",
"http://gzbbs.soufun.com/board/2811495702/",
"http://gzbbs.soufun.com/board/2811176022/",
"http://gzbbs.soufun.com/board/2811866676/"
)
#  the actual vector will be much longer.

ans <- vector("list",length(link))

for (i in seq_along(link)){
  ans[[i]] <- readLines(url(link[i]))
  Sys.sleep(8)
}
#######

The problem is, the sever will not response if the retrieval happens
too often and I don't know what the optimal time span between two
retrieval.
When the sever does not response to readLines, it will return an error
and stop. What I want to do is: when an error occurs, I put R to sleep
for say 60 seconds, and redo the readLines on the same link.

I did some search and guess withCallingHandlers and withRestarts will
do the trick. Yet, I didn't find much example on the usage of them.
Can you give me some suggestions? Thanks.

-- 
Wincent Rong-gui HUANG
Doctoral Candidate
Dept of Public and Social Administration
City University of Hong Kong
http://asrr.r-forge.r-project.org/rghuang.html



More information about the R-help mailing list