[R-sig-Geo] [R-sig-DB] [R] Connecting to PostgreSQL/PostGIS from R (rgdal?)

Tim Keitt tkeitt at gmail.com
Sat Jun 2 20:30:21 CEST 2007


On 6/2/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Sat, 2 Jun 2007, Tim Keitt wrote:
>
> > My Rdbi (+Rdbi.PgSQL) package was written to be much simpler and
> > easier to use than either DBI or RODBC. Unfortunately, I do not have
> > time to maintain it, so I'm not sure what state it is in. Either DBI
> > or RODBC should work fine for pushing tables back and forth. RODBC is
> > probably the most up-to-date and might be the best long term solution.
> > (DBI is the most complete I think in terms of a full blown db API.)
>
> There are quite a few 'db API' features in RODBC that are not in DBI.

It does look good. I've started experimenting with it and I may switch
entirely to RODBC. (I'm once again needing to do some work with very
large datasets.)

>
> > For pushing geometries into PostGIS, I invariably use the shp2pgsql
> > utility as it seems to give the best results. You could easily dump
> > our Spatial*DataFrame to a file and shell out to run shp2pgsql. Not
> > elegant but would work. (Wish I had time to finish low-level OGR
> > coverage in rgdal...)
> >
> > Edzer's approach is one I've suggested on many occasions -- that we
> > subclass the sp classes to act as a proxy to a PostGIS table. I first
> > developed that approach in rpgsql -- you could create proxy data
> > frames that simply forwarded access requests to PostgreSQL (I think
> > the idea has now been applied to SQLite tables). The problem of course
> > is that R does not support polymorphism "below the surface" in the C
> > level implementation, so once a proxy object is passed to R's
> > internals, it is nothing more than a pointer with no call interface.
>
> I really don't think that is true, and indeed polymorphism is almost
> entirely implemented at C level.  A visible R object and a SEXP are one
> and the same thing.

I guess what I mean is that -- unless I'm mistaken -- I cannot
override the data access routines. Lets say I have a C function that
takes an SEXP as an argument and the function expects that SEXP to
point to a data frame. I cannot call the same C function on an SEXP
that points to a PostgreSQL proxy object because the C code will try
to dereference memory offset from the base of the data portion of the
SEXP. In the case of the db handle, there is no data in memory! Or is
there now a full C-level function call interface for manipulating data
frames (that can handle type dependent dispatch)? Cool if so -- its
been awhile since I looked.

(Just as an example think about what happens when the data in a matrix
is passed to BLAS routines and what would happen if you tried to pass
the db handle to BLAS.)

THK

>
> >
> > THK
> >
> > On 6/2/07, Edzer J. Pebesma <e.pebesma at geo.uu.nl> wrote:
> >> Mike, my experience with linking R with PostGIS are documented here:
> >>
> >> http://wiki.intamap.org/index.php/PostGIS
> >>
> >> which I may have mentioned before on r-sig-geo.
> >>
> >> Indeed, as Tim confirms, besides the (r)gdal link, (R)ODBC is another
> >> working link for transferring table information. I doubt whether it will
> >> be easy to transport e.g. line or polygon topology through this link.
> >> --
> >> Edzer
> >>
> >>
> >>>
> >>>
> >>>
> >>>> I apologize if I come accross as somewhat naive in this - I certainly am
> >>>> still a novice when it come to the use of R.  Also, I apologize for the
> >>>> longish email that follows, but if you're interested in what I'm trying
> >>>> to do, this is the clearest explanation I have off the top of my head:
> >>>>
> >>>> My needs are that I'm working on a .Net gui which displays map data
> >>>> using the C# MapScript library for MapServer, and manipulates/analyzes
> >>>> spatial data (generally just points for now, but this might change in
> >>>> the future) stored in a PostgreSQL/PostGIS database.  Roger, I think
> >>>> you'll recall asking me about this tool, for which I had a poster up at
> >>>> the OSGeo conference in Lausanne last year.  Lately, I've been trying to
> >>>> port it to Gtk# so that it will work on both Linux and Windows, and so
> >>>> far it seems to be working.  However, if I add any real functionality to
> >>>> it in the future, I'd like to find a better way to link PostgreSQL with
> >>>> the functionality of R.
> >>>>
> >>>> In this tool, I had started to use R as I feel it's better suited to do
> >>>> some of the more complicated processing - though so far this is limited
> >>>> to creating voronoi polygons, for which the deldir library in R worked
> >>>> very well.  To make this happen, I used system process objects in
> >>>> Mono/.Net to control the R commandline.  The issue I faced here is that
> >>>> the only way I could get the data into R from PostgreSQL (without
> >>>> requiring 'a priori' setup on the client machine of something like an
> >>>> ODBC driver with the RODBC librar) was to dump the data to a temporary
> >>>> text file, import it into R, process it in R, dump it back into text
> >>>> formatted for SQL, then execute the SQL to input the data into PostgreSQL.
> >>>>
> >>>> This overall process would be much more efficient if I could find a more
> >>>> integrated way of transferring data between R/PostgreSQL - right now my
> >>>> .Net application primarily processes data within PostgreSQL itself, but
> >>>> integration with R would make a great deal of sense for me, as its
> >>>> primary purpose is to calculate spatial statistics.
> >>>>
> >>>> However, my ideal solution would be to have a package of binaries that
> >>>> are self sufficient, so I can give someone the compiled binaries on a CD
> >>>> for example, and that person could use it on a Windows machine with no
> >>>> initial setup, or he/she could use it on a Linux machine where
> >>>> installation of the necessary prequisites is either already done, or
> >>>> fairly easy (at least from my experience with Fedora so far, using the
> >>>> standard repositories).  At the moment, the only missing piece here is
> >>>> the link between R and PostgreSQL.
> >>>>
> >>>> Maybe there is a way I can accomplish this with RODBC that I'm unaware
> >>>> of...but as as I understand it, this requires a Windows user to install
> >>>> the PostgreSQL ODBC driver first, then create a DSN to each database
> >>>> that is used in PostgreSQL.  If I'm wrong about any of this, I
> >>>> apologize, and I'd be happy to be corrected.  Every once in a while I
> >>>> take a look around online to see if anything new has been released
> >>>> (e.g., a win32 port of pl/r, or some easy-to-use solution for connecting
> >>>> to PgSQL from within R).  It seems like rgdal might be a sort-of
> >>>> solution, at least for spatail data - except that I'm somewhat inept at
> >>>> compiling it cleanly (I did get it working, and it seems okay, but it's
> >>>> not packaged properly - I'm sure of that).  Even so, it doesn't seem to
> >>>> have an option for linking to non-spatial data (at least, from what I've
> >>>> been able to figure out from experimentation and documentation), which
> >>>> would be pretty useful as well.
> >>>>
> >>>> If either of you have any additional suggestions, I'd be glad to hear them.
> >>>>
> >>>> Sorry again for such a long email.
> >>>>
> >>>> Regards,
> >>>> Mike
> >>>>
> >>>> Roger Bivand wrote:
> >>>>
> >>>>
> >>>>> On Thu, 31 May 2007, Prof Brian Ripley wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Roger,
> >>>>>>
> >>>>>> I see that on Linux I have an ODBC driver.  That should not introduce
> >>>>>> further dependencies on Windows, and might be useful.
> >>>>>>
> >>>>>> OTOH, I do think RODBC is the answer to the underlying question.  It works
> >>>>>> well with PostgreSQL on Windows, and abstracts out the pesky differences
> >>>>>> between PostgreSQL clients in different versions (as the ODBC driver is
> >>>>>> part of PostgreSQL).
> >>>>>>
> >>>>>>
> >>>>> Brian,
> >>>>>
> >>>>> Thanks, of course there are better routes than rgdal/PostGIS when the data
> >>>>> are not geometries. But because PostGIS stores its representations in
> >>>>> specific ways, both the PostgreSQL and GDAL/OGR sides need access to the
> >>>>> PostGIS library to unpick the parts of the lines and polygons involved.
> >>>>> Edzer has had some success with this on Linux, as far as I understand to
> >>>>> insert results from R into a PostGIS/PostgreSQL database, I think being
> >>>>> read by MapServer for HTTP delivery.
> >>>>>
> >>>>> I'm not sure what Mike's specific needs are - if aspatial or simple point
> >>>>> coordinates, it ought to be possible to avoid the intermediate data layer,
> >>>>> unless something "on the other side" needs it. With line or polygon
> >>>>> geometries, some intermediate layer is needed, and that's where the
> >>>>> dependencies come from.
> >>>>>
> >>>>> The FWTools Windows binaries for rgdal built against HDF4 seem to work,
> >>>>> against its NetCDF which crashes R (I think this is one of Frank
> >>>>> Warmerdam's DLLs doing stdcall, and another not doing it). The user
> >>>>> workaround in that case was to use your ncdf binary, which was feasible
> >>>>> for raster data.
> >>>>>
> >>>>> FWTools is advertised as aiming: "to track the latested development
> >>>>> versions of the packages included as opposed to official releases. While
> >>>>> this may mean the packages are less stable, it is intended to give folks a
> >>>>> chance to use the latest and greatest". Having talked to the OSGeo
> >>>>> developers, it seems that their "user" is a system integrator of some
> >>>>> kind, setting up a production system for a larger organisation in-house or
> >>>>> more broadly. This doesn't match too well with the R "user" profile,
> >>>>> particularly in a one-off research setting.
> >>>>>
> >>>>> Roger
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Brian
> >>>>>>
> >>>>>>
> >>>>>> On Wed, 30 May 2007, Roger Bivand wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Wed, 30 May 2007, Mike Leahy wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I've been trying every now and then to find a cross operating system
> >>>>>>>> solution that would let me access PostgreSQL (and PostGIS) from R, or to
> >>>>>>>> access R from PostgreSQL.  I know of pl/r, which accomplishes the
> >>>>>>>> latter, but has yet to be successfully ported to Windows.  Similarly,
> >>>>>>>> I've tried to use Rdbi and DBI, but I haven't had luck with those on
> >>>>>>>> Windows either for connecting to PostgreSQL from R.  Can anyone suggest
> >>>>>>>> a solution for this?
> >>>>>>>>
> >>>>>>>> It would seem that rgdal could also help me in this case. Unfortunately,
> >>>>>>>> the version of the GDAL library that is included in the rdgal binary
> >>>>>>>> available on CRAN (for windows) doesn't include the PostgreSQL driver
> >>>>>>>> for OGR (i.e., it's not listed by the ogrDrivers() function).
> >>>>>>>>
> >>>>>>>> I compiled rgdal on Windows myself using the GDAL library from
> >>>>>>>> FWTools-1.3.1, but I was unsuccessful at creating a proper binary
> >>>>>>>> package for R.  I was only able to get it to work by substituting the
> >>>>>>>> rgdal.dll that was installed by CRAN with the one that I compiled that
> >>>>>>>> links against the GDAL library from FWTools.  Even though it works (at
> >>>>>>>> first glance with ogrInfo(), and readOGR()), I still get a warning
> >>>>>>>> message when I load the libary: "DLL attempted to change FPU control
> >>>>>>>> word from 8001f to 9001f".
> >>>>>>>>
> >>>>>>>> So my question with respect to rgdal is a) is it likely that an rgdal
> >>>>>>>> package is going to be released in the future with the PostgreSQL driver
> >>>>>>>> included in GDAL/OGR, or b) are there any suggestions/instructions that
> >>>>>>>> might get me through the compilation and packaging process for rgdal
> >>>>>>>> with better success?
> >>>>>>>>
> >>>>>>>>
> >>>>>>> The warning is harmless - R is just reporting that it has stopped the
> >>>>>>> dynamically linked libraries resetting a flag that they should not change
> >>>>>>> while R is running. If you followed the notes in README.windows in rgdal,
> >>>>>>> you ought to be OK. There are no plans to provide more Windows binary
> >>>>>>> drivers than those present now, because the others involve further
> >>>>>>> external dependencies, which most users would not welcome.
> >>>>>>>
> >>>>>>> So please try to do an ogrinfo at the command line in Windows using
> >>>>>>> FWTools to your PostGIS data, and then the equivalent within R with your
> >>>>>>> locally built rgdal, and see how it goes. Even on Linux, getting all the
> >>>>>>> components lined up isn't easy, according to people who have tried, but
> >>>>>>> can be done if you need to do it.
> >>>>>>>
> >>>>>>> Hope this helps,
> >>>>>>>
> >>>>>>> Roger
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks in advance for any help,
> >>>>>>>> Mike
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> R-sig-Geo mailing list
> >>>>>>>> R-sig-Geo at stat.math.ethz.ch
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> R-sig-Geo at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >
> >
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>


-- 
Timothy H. Keitt, University of Texas at Austin
Contact info and schedule at http://www.keittlab.org/tkeitt/
Reprints at http://www.keittlab.org/tkeitt/papers/
ODF attachment? See http://www.openoffice.org/




More information about the R-sig-Geo mailing list