[R-sig-Geo] Current options for creating/querying vector data WITHOUT loading them into memory?

Roger Bivand Roger.Bivand at nhh.no
Fri Jan 17 20:22:50 CET 2014


On Fri, 17 Jan 2014, Jonathan Greenberg wrote:

> Across all vector formats, which do you think would be a good
> intermediate between in-memory Spatial* and PostGIS?  I'd put a few
> stipulations:
> 1) The format should be open source and supported by existing APIs 
> (OGR/rgeos)
> 2) It should be portable (file-based)
> 3) It should be "scalable" (able to support arbitrarily large vector 
> databases)

Could I ask for a range of use cases? The sp classes are designed for 
statistical analysis, so in general some hundreds of thousands of 
observations/features should suffice amply. The use cases should 
demonstrate which kinds of objects and functionalities are thought 
necessary. The fact that there is lots of data doesn't mean that it is all 
needed for analysis or inference, or even visualization, I think?

Have you considered interfacing the OGR utilities from the system() call 
to subset features/fields?

I think that 2) - file-based - is moot, if there is that much data, it 
needs to be in a database system, possibly with an OGR driver, which OGR 
utilities could access.

Have you considered Terralib (now 4, the development version 5 will be 
closer to GDAL/OGR)? My intuition is that this is a viable solution.

We really also need to accommodate space-time objects in any significant 
revision, I think - or at least prepare object structures that are 
forward-looking with regard to temporal data.

I have asked several times for volunteers to rewrite rgdal::readOGR 
(without anyone stepping forward), because it is fairly inefficient, and 
should support SQL queries introduced in GDAL/OGR from 1.8. Supporting the 
OGR SQL dialect means that all drivers support queries on FID and field 
values.

Within the next four years, I will be giving up maintenance of rgdal and 
rgeos (possibly other packages too). I can help, but users do not deserve 
key packages potentially compromised by the health and poor 
responsiveness of an emeritus. Forward planning is needed for others to 
take on these responsibilities before it becomes a matter of urgency. The 
pool of active developers must be enlarged this year.

Roger

>
> Cheers!
>
> --j
>
> On Thu, Jan 16, 2014 at 2:49 PM, Tim Keitt <tkeitt at utexas.edu> wrote:
>>
>>
>>
>> On Thu, Jan 16, 2014 at 1:40 PM, Jonathan Greenberg <jgrn at illinois.edu>
>> wrote:
>>>
>>> I've wondered if it would be possible to do something like what Robert
>>> did with the raster() package, where the analysis (read/write) was
>>> being done on-demand on the data rather than entirely in-memory.
>>> Vector data is, of course, much more complicated to come up with
>>> elegant solutions than raster data, but I think some basic
>>> functionality would be great.  Perhaps spatialite as a backbone (since
>>> you can easily install sqlite executable via the Rsqlite package, and
>>> there is a now-abandoned but available code base in
>>> http://cran.r-project.org/web/packages/SQLiteMap/ (I spoke to the
>>> developer who said he won't be updating it) that might allow for a
>>> relatively easy cross-platform install of the spatialite addon.
>>> Something that would fill in the gap between the Spatial* classes
>>> (which won't scale to large datasets) and PostGIS (which requires much
>>> more complex installation requirements)?
>>>
>>> How does spatialite perform in terms of large queries?  I imagine not
>>> as well as PostGIS, but does it at least scale memory-wise on most
>>> standard queries?
>>
>>
>> I've not used it. Generally sqlite is faster than postgresql but not as
>> reliable. I just don't want to learn another syntax variation. Utilizing
>> spatial indices for example in spatialite requires explicit modification of
>> your SQL queries. There is no automatic index queries based on the planner
>> as in postgresql. But its a very useful tool as you can do everything out of
>> a single file on disk.
>>
>> THK
>>
>>>
>>>
>>> --j
>>>
>>> On Thu, Jan 16, 2014 at 1:14 PM, Tim Keitt <tkeitt at utexas.edu> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Jan 16, 2014 at 1:09 PM, Barry Rowlingson
>>>> <b.rowlingson at lancaster.ac.uk> wrote:
>>>>>
>>>>> Well, back when I wrote 'rmap' I abstracted out the storage of the
>>>>> data from the data object... So your object in R could represent a
>>>>> subset of a shapefile, and the code only grabbed that chunk of the
>>>>> shapefile when it was needed, for example to plot (the R object was
>>>>> basically the name of the shapefile plus a selection vector).
>>>>>
>>>>> Then we threw that code out and sp classes were born!
>>>>>
>>>>>  I've often thought about restoring some of this kind of
>>>>> functionality, but R's object-oriented classes just frustrate me. Its
>>>>> not so simple to build a superclass of sp class objects. Or maybe it
>>>>> is now? For some value of 'simple'...
>>>>>
>>>>>  Suppose you had a gigantic spatialite db - if you want to work with
>>>>> it spatially (mapping, rgeos) you are going to have to get the bits
>>>>> you need into main memory, so the simplest is just to load selections
>>>>> into sp-class objects. Is that already possible with the OGR
>>>>> spatialite driver? Can you also load subsets of shapefiles using some
>>>>> SQL passed to the OGR shapefile driver?
>>>>>
>>>>>  What would you want to do on whole-dataset objects of this class?
>>>>> Would you want to do the processing on the database if possible (if
>>>>> its PostGIS or Spatialite)? Or have an automatic chunking procedure
>>>>> for operations that don't need the whole database at once, such as
>>>>> finding centroids of polygons?
>>>>>
>>>>> Hmmm thoughts thoughts thoughts and no action :( Sorry!
>>>>
>>>>
>>>> Barry,
>>>>
>>>> I'll have more to say on this in a couple of weeks.
>>>>
>>>> THK
>>>>
>>>>>
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 16, 2014 at 6:52 PM, Jonathan Greenberg <jgrn at illinois.edu>
>>>>> wrote:
>>>>>> r-sig-geo'ers:
>>>>>>
>>>>>> As vector datasets are getting a lot larger, there is a limitation
>>>>>> with the Spatial* formats in that they must be loaded into main
>>>>>> memory.  I was curious what folks who have been dealing with massive
>>>>>> vector files have come up with working within the R environment?  Has
>>>>>> anyone played around with file geodatabases or spatialite formats
>>>>>> (for
>>>>>> instance)?  How are you creating/querying the data?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --j
>>>>>>
>>>>>> --
>>>>>> Jonathan A. Greenberg, PhD
>>>>>> Assistant Professor
>>>>>> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
>>>>>> Department of Geography and Geographic Information Science
>>>>>> University of Illinois at Urbana-Champaign
>>>>>> 259 Computing Applications Building, MC-150
>>>>>> 605 East Springfield Avenue
>>>>>> Champaign, IL  61820-6371
>>>>>> Phone: 217-300-1924
>>>>>> http://www.geog.illinois.edu/~jgrn/
>>>>>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype:
>>>>>> jgrn3007
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> http://www.keittlab.org/
>>>
>>>
>>>
>>> --
>>> Jonathan A. Greenberg, PhD
>>> Assistant Professor
>>> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
>>> Department of Geography and Geographic Information Science
>>> University of Illinois at Urbana-Champaign
>>> 259 Computing Applications Building, MC-150
>>> 605 East Springfield Avenue
>>> Champaign, IL  61820-6371
>>> Phone: 217-300-1924
>>> http://www.geog.illinois.edu/~jgrn/
>>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>>
>>
>>
>>
>> --
>> http://www.keittlab.org/
>
>
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list