[R-sig-Geo] Memory limit problems in R / import of maps

Roger Bivand Roger.Bivand at nhh.no
Tue Apr 22 19:59:53 CEST 2008


On Tue, 22 Apr 2008, Tomislav Hengl wrote:

>
> Dear Miltinho,
>
> Please clarify what do you mean by "the most important thing is the 
> dimension (number of cols and rows)". I the case of rgdal, R reads 
> everything and puts as one variable e.g.:

No, please look at what the readGDAL() function actually does. It opens a 
handle to a file, and *only* reads *from* the offset *for* region.dim 
cells and chosen bands (or decimates with output.dim). You can work on 
rasters of arbitrary size, provided you do it in parts. The components are 
there, you just need to use them with insight. You haven't shown why all 
the data must be in memory (it being easier that way is an acceptable 
answer if you have 64GB). Less well resourced researchers have to use 
their wits.

Wrt. bottlenecks below, I beg to differ - another bottleneck is why the 
nth 10MB of data adds to our understanding of the data generating 
processes, if the 1st 10MB of data didn't help?

Roger


>
>> str(maskHG)
> Formal class 'SpatialGridDataFrame' [package "sp"] with 6 slots
>  ..@ data       :'data.frame': 1800000 obs. of  1 variable:
>  .. ..$ band1: int [1:18000] 0 0 0 0 0 0 1 1 1 1 ...
>  ..@ grid       :Formal class 'GridTopology' [package "sp"] with 3 slots
>  .. .. ..@ cellcentre.offset: Named num [1:2] 3946500 3247500
>  .. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"
>  .. .. ..@ cellsize         : num [1:2] 1000 1000
>  .. .. ..@ cells.dim        : int [1:2] 1500 1200
>  ..@ grid.index : int(0)
>  ..@ coords     : num [1:2, 1:2] 3946500 4095500 3247500 3366500
>  .. ..- attr(*, "dimnames")=List of 2
>  .. .. ..$ : NULL
>  .. .. ..$ : chr [1:2] "x" "y"
>  ..@ bbox       : num [1:2, 1:2] 3946000 3247000 4096000 3367000
>  .. ..- attr(*, "dimnames")=List of 2
>  .. .. ..$ : chr [1:2] "x" "y"
>  .. .. ..$ : chr [1:2] "min" "max"
>  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
>  .. .. ..@ projargs: chr " +proj=laea +lat_0=52 +lon_0=10 +k=1.00000 +x_0=4321000 +y_0=3210000
> +ellps=GRS80 +datum=WGS84 +units=m +towgs84=0,0,0"
>
>
> so I do not think that col/rows makes difference.
>
> We typically work with 1200 rows x 1300 columns maps, but would like to 
> use even bigger maps very soon.
>
> I really see this problem of not being able to load larger maps to R as 
> the biggest bottleneck (worth investing even time to work on the R 
> code).
>
> Tom
>
>
> -----Original Message-----
> From: milton ruser [mailto:milton.ruser at gmail.com]
> Sent: dinsdag 22 april 2008 18:21
> To: Tomislav Hengl
> Cc: Dylan Beaudette; r-sig-geo at stat.math.ethz.ch; Michalis Vardakis
> Subject: Re: [R-sig-Geo] Memory limit problems in R / import of maps
>
> Hi all,
>
> In fact, sometimes I fill a little frustrated with some map handling on R.
> I also had problem with (almost to me) not so large maps reading. The curious is that the simple
> ArcView 3.2 read the files without problem (in GRD or ASC formats).
>
> In fact, I think that the most important thing is the dimension (number of cols and rows) and not
> the spatial resolution (10 meters, 250 meters etc). So, Tom, how large (cols and rows) are your
> maps?
>
> Kind redards,
>
> miltinho
>
>
>
> On 4/22/08, Tomislav Hengl <hengl at science.uva.nl> wrote:
>
>
> 	Dylan,
>
> 	Thanks for your note.
>
> 	A student of mine would like to run habitat suitability analysis using the adehabitat
> package
> 	(http://dx.doi.org/10.1890%2F0012-9658%282002%29083%5B2027%3AENFAHT%5D2.0.CO%3B2). I
> encouraged him
> 	to use R, for many reasons.
>
> 	At the moment, he is thinking of doing the whole thing in Matlab (or using the original
> Biomapper
> 	software), because we would not like to give up on the original resolution (250 m).
>
> 	As a GIS person, I definitively do not see ~20 millions pixels as a Huge data set.
>
> 	cheers,
>
> 	Tom Hengl
>
>
>
> 	-----Original Message-----
> 	From: Dylan Beaudette [mailto:dylan.beaudette at gmail.com]
> 	Sent: dinsdag 22 april 2008 17:22
> 	To: Tomislav Hengl
> 	Cc: r-sig-geo at stat.math.ethz.ch; Michalis Vardakis
> 	Subject: Re: [R-sig-Geo] Memory limit problems in R / import of maps
>
> 	On Tue, Apr 22, 2008 at 6:49 AM, Tomislav Hengl <hengl at science.uva.nl> wrote:
> 	>
> 	>  Dear list,
> 	>
> 	>  I know that much has already been said about the memory limit problems. If there is any
> progress
> 	>  about this problem, we would be interested to hear.
> 	>
> 	>  In our project, we are importing 24 maps/bands, each consists of 1,450,000 pixels. We
> further
> 	would
> 	>  like to glue all maps into a single data frame (e.g. 'ksc' class in adehabitat package;
> or
> 	>  'SpatialGridDataFrame' in sp package), but this seems to be impossible.
> 	>
> 	>  We tried to run this under windows (after following
> 	>
>
> http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-us
> 	>  es_0021 and setting the --max-mem-size) and under Linux Ubuntu, but still get the same
> error
> 	message
> 	>  (seems that there is no difference in use of memory under the two OS):
> 	>
> 	>  "Error: Cannot allocate vector of size 11.1 Mb"
> 	>
> 	>  The R workspace with 24 loaded grids is also really small (18 MB), but any further gluing
> and
> 	>  calculation is blocked due the vector size error message.
> 	>
> 	>  For a comparison, in a GIS such as ArcGIS or SAGA/ILWIS (open source) we have no problems
> of
> 	loading
> 	>  and processing 3-4 times more grids.
> 	>
> 	>  Should we simply give up on running spatial analysis using large grids (>10 million
> grids) in R?
>
> 	Hi,
>
> 	What exactly were you hoping to do with such a massive data frame once
> 	you overcame the initial memory problems associated with loading the
> 	data? Any type of multivariate, classification, or inference testing
> 	would certainly require just as much memory to perform any analysis on
> 	the stack of grids.
>
> 	Not knowing what the purpose of this operation is (although I would
> 	guess something related to soil property or landscape modeling of some
> 	sort), it is hard to suggest a better approach. For grid that size I
> 	would use an algorithm that operates on strips or tiles. There are
> 	several great starting points in the GRASS source code. Doing all of
> 	the pre-processing, and possibly some aggregating to larger support
> 	size, in GRASS would allow you to test any R-centric operations on a
> 	coarser version of the original dataset.
>
> 	Cheers,
>
> 	Dylan
>
> 	_______________________________________________
> 	R-sig-Geo mailing list
> 	R-sig-Geo at stat.math.ethz.ch
> 	https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list