[R-sig-Geo] Problem with operations on large files ("cannotallocate vector of size xx")

Tina Cormier tcorms at gmail.com
Fri Feb 3 14:47:17 CET 2017


Hi Kamil,

If you don't need the PCA raster as an output, just the stats, I think
Maurizio's first idea is the way to go. If you do need the raster, but can
sacrifice some resolution, his second option (using aggregate first, then
calculating the PCA) is the way to go.

A third option, if you need a PCA raster output at it's original
resolution, would be to calculate the PCA in blocks using the blockSize
function and then loop over each block to calculate the PCA for those
cells, and putting it all back together at the end. That should alleviate
your memory issues.

Take a look at this vignette for an example, and I have pasted a slightly
adapted version of it below as well:
https://cran.r-project.org/web/packages/raster/vignettes/functions.pdf

# Note that you do not have to write the file to disk here - you can write
to the raster object created before the loop (the variable 'out') and hold
it in memory using setValues instead of writeStart, writeValues, and
writeStop.

# In the example in the pdf, this is written as a function. You can do it
either way.

# x is your stack
out <- raster(x) # assuming you are writing just the 1st pca to file -
otherwise this would be stack instead of raster.
bs <- blockSize(x)
out <- writeStart(out, filename, overwrite=TRUE)

# Loop over each block and do your calculation
for (i in 1:bs$n) {
    v <- getValues(x, row=bs$row[i], nrows=bs$nrows[i] )

   # Replace the next line with your pca analysis and would need code to
get it back to a raster
   # format (as long as you know the positions of the NAs, that's easy. Let
me know if you need help.
      v <- v + 5

      out <- writeValues(out, v, bs$row[i])
   }
out <- writeStop(out)
return(out)
##################

Hope that helps! Good luck!
Tina


On Fri, Feb 3, 2017 at 6:48 AM, Maurizio Marchi <mauriziomarchi85 at gmail.com>
wrote:

> Dear Kamil,
> I have two suggestions:
>
> 1)
> to run PCA on a subset of pixels, randomly selected using the sampleRandom
> function of the raster package and then calculate the pca components as
> rasters. Something like this where "r" is the RasterStack:
> Rsampl<-sampleRandom(WCmaps,n)
> #here n is the number of points randomly created; try with 5000 but you can
> increase and you
> #will see that after a specific value the results will not change
> significantly so you will stop
> pca<-prcomp(Rsampl,scale=T,center=T,retx=T)
> #where "scale=T,center=T" is the same of "cor=T" in the princomp function
> pca
> summary(pca)
> rastersPCA<-stack(predict(r,pca,index=1:n,progress='text'))
> #here "n" is the number of components you want to create, generally up to
> 99% of the explained cumulative variance or those who explained at least 1
> time the variance
>
> 2)
> to run PCA on coarser rasters. Again "r" is the original RasterStack
> AggregatedRasterStack<-aggregate(r,fact=n,fun=mean)
> #n here is the number of pixels to be considered to calculate the mean. For
> instance if you have
> #a pixel of 1km edge, fact=5 will create a rasterStack with 5 km of spatial
> resolution
> pca<-prcomp(na.omit(values(AggregatedRasterStack)),scale=
> T,center=T,retx=T)
> #where again "scale=T,center=T" is the same of "cor=T" in the princomp
> function
> pca
> summary(pca)
> rastersPCA<-stack(predict(r,pca,index=1:n,progress='text'))
> #here "n" is the number of components you want to create, generally up to
> 99% of the explained cumulative variance or those who explained at least 1
> time the variance
>
> Under a statistical point of view, the first is a random sampling while the
> second is a sort of regular sampling
>
> Regards, maurizio
>
> --
> Maurizio Marchi,
> PhD - Forest Science, Ecological Modelling
> Council for Agricultural Research and Economics (CREA, Italy)
> SkypeID: maurizioxyzhttps://scholar.google.it/citations?user=_
> 2X6fu8AAAAJ&hl=en
> #-----#http://ec.europa.eu/environment/life/project/
> Projects/index.cfm?fuseaction=search.dspPage&n_proj_id=
> 5035www.selpibio.euwww.eufgis.org
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list