[R] read the middle of a file
R Heberto Ghezzo, Dr
heberto.ghezzo at mcgill.ca
Mon Aug 2 21:33:55 CEST 2010
Hello,
The other day Justin Peter presented a mini program to plot a topographic map with an overlay of the worldHires. I seemed interesting so I checked the ETOPO5 site and find that there is a new file ETOPO1 with a 1 minute grid. I downloaded it and tried a similar procedure. Now the ETOPO1.gz is 1 Gb and the uncompressed file is 5 Gb. They do not fit into my laptop. I tried the following program that works, but
it takes too long to read the file,
Now the question: is there a faster way to read a piece of a file in the middle of it?
My program:
#
lon.min <- 140.0
lon.max <- 150.0
lat.min <- -45.0
lat.max <- -40.0
#
library(maps)
library(mapdata)
map('worldHires',xlim=c(lon.min,lon.max),ylim=c(lat.min,lat.max),lwd=2) # to check is OK
#
con <- file("c:/temp/Etopo/ETOPO1_Bed_c_int.xyz.gz","r")
la <- (90 - lat.max) * 60 # top
nla <- -(la - (90 - lat.min) * 60) # length in latitud
lo <- (180 + lon.min) * 60 # first longitud
nlo <- (180 + lon.max)*60 - lo # length in longitud
#
cat("from Long ",lo/60-180," to ",(lo+nlo)/60-180," and Lat ",lat.min," to ",lat.max," degrees \n")
cat(" top discard ",la*360*60+lo," records, Are ",la*60," segments of ",nlo," usefull,discard ",360*60-nlo,"\n")
ti0 <- proc.time()
for(i in 1:360) {
b <- readLines(con,n=la*60) # discard top by degrees
# without this loop the vector b does not fit in RAM
rm(b)
}
b <- readLines(con,n=lo-1) # discard top-to long
rm(b)
ti1 <- proc.time()
b <- NULL
for(i in 1:nla) {
b <- c(b,readLines(con,n=nlo+2)) # read line and accumulate
readLines(con,n=360 * 60 - nlo-2) # discard rest of the line
}
#
ti2 <- proc.time()
bbst <- strsplit(b,",") # b is list of records (strings), change to reals
bbstu <- unlist(bbst)
bbstun <- as.numeric(bbstu)
bbstum <- matrix(bbstun,ncol=3,byrow=TRUE)
dim(bbstum)
z <- matrix(bbstum[,3],ncol=nlo+2 ,byrow=TRUE)
x <- rev(unique(bbstum[,2]))
y <- unique(bbstum[,1])
#
sz <- apply(z,2,rev)
image(y,x,t(sz),col=topo.colors(20))
contour(y,x,t(sz),nlevels=20,add=TRUE)
map('worldHires',xlim=range(y),ylim=range(x),add=TRUE,lwd=2)
#
ti3 <- proc.time()
ti1 - ti0
ti2 - ti1
ti3 - ti2
and the timings are:
> tasmania
ti1 - ti0
user system elapsed
563.06 1.40 578.02
> ti2 - ti1
user system elapsed
26.74 0.08 26.97
> ti3 - ti2
user system elapsed
2.76 1.42 4.21
> #
So it takes me 10 minutes to read and discard the top of the file and half a minute to read the usefull latitudes and discard the unusefull longitudes
Thanks for any help to speed-up/ improve this program
Heberto Ghezzo
Montreal
More information about the R-help
mailing list