[R-sig-Geo] as.data.frame excluding NA in raster package

Bastien.Ferland-Raymond at mrn.gouv.qc.ca Bastien.Ferland-Raymond at mrn.gouv.qc.ca
Thu Oct 10 14:29:34 CEST 2013


Dear R Geoer,

This is more a request than a question as I already succeeded at doing what I wanted to do. However, I'm open to suggestion to do it better, but I really think is would be a nice feature to add the the as.data.frame() function in the raster package.

I've been using the as.data.frame() function recently to extract data from rasterbrick to be incorporated into different models (knn, lm, kmeans, etc.).  My raster is usually full of NA (the border and exclusion inside) which I have to remove before doing my models.  The problem is that my raster is so big that I often cannot just do a simple :

tab <- as.data.frame(r)
tab <- tab[!is.na(tab[,1]]

as it break at the first line for lack of memory.  Therefore, I think it would be useful to have a new argument to as.data.frame() like "exclude.na=T" so it already ignore the NA while creating the data.frame.  In some situations, it may allow us to work with raster a little bigger.

As I said, I did manage to do that (see code below).  However my code is probably not as efficient as if a pro-coder would have done it and I still think it would be a great addition the the as.data.frame() function:

###
library(raster)

# preparing the raster
r <- raster(nrow=1000, ncol=1000, xmn=0, xmx=1000, ymn=0, ymx=1000, crs=NA)
dat <- rep(NA, 1e+06)
dat[sample(1:1e+06, 2000)] <- runif(2000,0,1)
values(r) <- dat
plot(r)

# the traditional function
tab1 <- as.data.frame(r)
object.size(tab1)

# my modified as.data.frame function
# I have to tile the territory because if I don't, I have the same problem as the as.data.frame() function
as.df.no.na <- function(ras, nb.tuile){
tabfin <- as.data.frame(matrix(NA, 0,nlayers(ras), dimnames=list(NULL,names(ras))))
coupe <- c(0,round(nrow(ras)/nb.tuile)*(1:(nb.tuile-1)),nrow(ras))
for(i in 1:nb.tuile){
  mat <- getValues(ras, coupe[i]+1,coupe[i+1]-coupe[i])
  tab <- data.frame(mat,row.names=(coupe[i]*ncol(ras)+1):(coupe[i+1]*ncol(ras)))
  tabfin <- rbind(tabfin,tab[!is.na(tab[,1]),,drop=F])
}
tabfin
}

tab2 <- as.df.no.na(ras=r, nb.tuile=10)
object.size(tab2)
###

There!  Anybody thinks it like me that it would be a great addition or knows a better way to do it?  Could it be implemented in the raster package? Robert?  Thanks!

Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
Division des orientations et projets spéciaux
Direction des inventaires forestiers
Ministère des Ressources naturelles



More information about the R-sig-Geo mailing list