[R-sig-Geo] Current options for creating/querying vector data WITHOUT loading them into memory?

Roger Bivand Roger.Bivand at nhh.no
Sat Jan 18 08:31:17 CET 2014


On Fri, 17 Jan 2014, Tim Keitt wrote:

> On Fri, Jan 17, 2014 at 1:22 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Fri, 17 Jan 2014, Jonathan Greenberg wrote:
>>
>>  Across all vector formats, which do you think would be a good
>>> intermediate between in-memory Spatial* and PostGIS?  I'd put a few
>>> stipulations:
>>> 1) The format should be open source and supported by existing APIs
>>> (OGR/rgeos)
>>> 2) It should be portable (file-based)
>>> 3) It should be "scalable" (able to support arbitrarily large vector
>>> databases)
>>>
>>
>> Could I ask for a range of use cases? The sp classes are designed for
>> statistical analysis, so in general some hundreds of thousands of
>> observations/features should suffice amply. The use cases should
>> demonstrate which kinds of objects and functionalities are thought
>> necessary. The fact that there is lots of data doesn't mean that it is all
>> needed for analysis or inference, or even visualization, I think?
>>
>> Have you considered interfacing the OGR utilities from the system() call
>> to subset features/fields?
>>
>> I think that 2) - file-based - is moot, if there is that much data, it
>> needs to be in a database system, possibly with an OGR driver, which OGR
>> utilities could access.
>>
>> Have you considered Terralib (now 4, the development version 5 will be
>> closer to GDAL/OGR)? My intuition is that this is a viable solution.
>>
>> We really also need to accommodate space-time objects in any significant
>> revision, I think - or at least prepare object structures that are
>> forward-looking with regard to temporal data.
>>
>> I have asked several times for volunteers to rewrite rgdal::readOGR
>> (without anyone stepping forward), because it is fairly inefficient, and
>> should support SQL queries introduced in GDAL/OGR from 1.8. Supporting the
>> OGR SQL dialect means that all drivers support queries on FID and field
>> values.
>>
>> Within the next four years, I will be giving up maintenance of rgdal and
>> rgeos (possibly other packages too). I can help, but users do not deserve
>> key packages potentially compromised by the health and poor responsiveness
>> of an emeritus. Forward planning is needed for others to take on these
>> responsibilities before it becomes a matter of urgency. The pool of active
>> developers must be enlarged this year.
>>
>
> Roger,
>
> Thank you for your maintenance efforts!
>
> I've drifted towards postgis/C++ over time in my own work, but am now 
> developing some courses around R. I anticipate being fairly active with 
> R development going forward. I have a full rewrite of the OGR io bits 
> that I will make available soon. It works really well when your data are 
> in postgis or any other OGR format.

Tim,

This is very positive, thanks! When you are ready (or even before!), I'd 
strongly encourage others to get to know more of the rgdal/rgeos 
internals, and developments in the underlying software and standards. I'll 
be speaking at OGRS in Helsinki in June (http://2014.ogrs-community.org); 
could we use that as a tentative time frame (especially if interacting 
with others in the open source geospatial communities may be helpful)? 
Should we try to put an RFC together (and put it on R-forge, for example)?

I have considered using the OGC/GEOS representation under a thin "new" sp, 
but couldn't see how to avoid having at least one representation of 
geometries in memory. The sp <-> GEOS bridge in rgeos is there and sort-of 
works (the classes don't map exactly), but involves a lot of conversion. 
As OGR can link to GEOS, it might make sense to consider merging the 
packages. I couldn't see how to approach the elegance of your external 
pointer code for low-level GDAL interaction - pointing to an open GDAL 
object, but then regular grids have sparse geometries.

Roger

>
> THK
>
>
>>
>> Roger
>>
>>
>>
>>> Cheers!
>>>
>>> --j
>>>
>>> On Thu, Jan 16, 2014 at 2:49 PM, Tim Keitt <tkeitt at utexas.edu> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 16, 2014 at 1:40 PM, Jonathan Greenberg <jgrn at illinois.edu>
>>>> wrote:
>>>>
>>>>>
>>>>> I've wondered if it would be possible to do something like what Robert
>>>>> did with the raster() package, where the analysis (read/write) was
>>>>> being done on-demand on the data rather than entirely in-memory.
>>>>> Vector data is, of course, much more complicated to come up with
>>>>> elegant solutions than raster data, but I think some basic
>>>>> functionality would be great.  Perhaps spatialite as a backbone (since
>>>>> you can easily install sqlite executable via the Rsqlite package, and
>>>>> there is a now-abandoned but available code base in
>>>>> http://cran.r-project.org/web/packages/SQLiteMap/ (I spoke to the
>>>>> developer who said he won't be updating it) that might allow for a
>>>>> relatively easy cross-platform install of the spatialite addon.
>>>>> Something that would fill in the gap between the Spatial* classes
>>>>> (which won't scale to large datasets) and PostGIS (which requires much
>>>>> more complex installation requirements)?
>>>>>
>>>>> How does spatialite perform in terms of large queries?  I imagine not
>>>>> as well as PostGIS, but does it at least scale memory-wise on most
>>>>> standard queries?
>>>>>
>>>>
>>>>
>>>> I've not used it. Generally sqlite is faster than postgresql but not as
>>>> reliable. I just don't want to learn another syntax variation. Utilizing
>>>> spatial indices for example in spatialite requires explicit modification
>>>> of
>>>> your SQL queries. There is no automatic index queries based on the
>>>> planner
>>>> as in postgresql. But its a very useful tool as you can do everything
>>>> out of
>>>> a single file on disk.
>>>>
>>>> THK
>>>>
>>>>
>>>>>
>>>>> --j
>>>>>
>>>>> On Thu, Jan 16, 2014 at 1:14 PM, Tim Keitt <tkeitt at utexas.edu> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 16, 2014 at 1:09 PM, Barry Rowlingson
>>>>>> <b.rowlingson at lancaster.ac.uk> wrote:
>>>>>>
>>>>>>>
>>>>>>> Well, back when I wrote 'rmap' I abstracted out the storage of the
>>>>>>> data from the data object... So your object in R could represent a
>>>>>>> subset of a shapefile, and the code only grabbed that chunk of the
>>>>>>> shapefile when it was needed, for example to plot (the R object was
>>>>>>> basically the name of the shapefile plus a selection vector).
>>>>>>>
>>>>>>> Then we threw that code out and sp classes were born!
>>>>>>>
>>>>>>>  I've often thought about restoring some of this kind of
>>>>>>> functionality, but R's object-oriented classes just frustrate me. Its
>>>>>>> not so simple to build a superclass of sp class objects. Or maybe it
>>>>>>> is now? For some value of 'simple'...
>>>>>>>
>>>>>>>  Suppose you had a gigantic spatialite db - if you want to work with
>>>>>>> it spatially (mapping, rgeos) you are going to have to get the bits
>>>>>>> you need into main memory, so the simplest is just to load selections
>>>>>>> into sp-class objects. Is that already possible with the OGR
>>>>>>> spatialite driver? Can you also load subsets of shapefiles using some
>>>>>>> SQL passed to the OGR shapefile driver?
>>>>>>>
>>>>>>>  What would you want to do on whole-dataset objects of this class?
>>>>>>> Would you want to do the processing on the database if possible (if
>>>>>>> its PostGIS or Spatialite)? Or have an automatic chunking procedure
>>>>>>> for operations that don't need the whole database at once, such as
>>>>>>> finding centroids of polygons?
>>>>>>>
>>>>>>> Hmmm thoughts thoughts thoughts and no action :( Sorry!
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Barry,
>>>>>>
>>>>>> I'll have more to say on this in a couple of weeks.
>>>>>>
>>>>>> THK
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 16, 2014 at 6:52 PM, Jonathan Greenberg <
>>>>>>> jgrn at illinois.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> r-sig-geo'ers:
>>>>>>>>
>>>>>>>> As vector datasets are getting a lot larger, there is a limitation
>>>>>>>> with the Spatial* formats in that they must be loaded into main
>>>>>>>> memory.  I was curious what folks who have been dealing with massive
>>>>>>>> vector files have come up with working within the R environment?  Has
>>>>>>>> anyone played around with file geodatabases or spatialite formats
>>>>>>>> (for
>>>>>>>> instance)?  How are you creating/querying the data?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> --j
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jonathan A. Greenberg, PhD
>>>>>>>> Assistant Professor
>>>>>>>> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
>>>>>>>> Department of Geography and Geographic Information Science
>>>>>>>> University of Illinois at Urbana-Champaign
>>>>>>>> 259 Computing Applications Building, MC-150
>>>>>>>> 605 East Springfield Avenue
>>>>>>>> Champaign, IL  61820-6371
>>>>>>>> Phone: 217-300-1924
>>>>>>>> http://www.geog.illinois.edu/~jgrn/
>>>>>>>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype:
>>>>>>>> jgrn3007
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> R-sig-Geo at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> http://www.keittlab.org/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jonathan A. Greenberg, PhD
>>>>> Assistant Professor
>>>>> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
>>>>> Department of Geography and Geographic Information Science
>>>>> University of Illinois at Urbana-Champaign
>>>>> 259 Computing Applications Building, MC-150
>>>>> 605 East Springfield Avenue
>>>>> Champaign, IL  61820-6371
>>>>> Phone: 217-300-1924
>>>>> http://www.geog.illinois.edu/~jgrn/
>>>>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> http://www.keittlab.org/
>>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list