[R-sig-Geo] merging tables by columns AND row names (coordinates)

Roger Bivand Roger.Bivand at nhh.no
Sat Sep 9 15:34:30 CEST 2006


On Fri, 8 Sep 2006, Mikkel Grum wrote:

> merge(Table1, Table2, 
>    by = intersect(c("XCOORD", "YCOORD"), 
>    c("XCOORD", "YCOORD")), all = TRUE)
> 
> It might not handle the amount of data you have, but,
> if your tables are normal dataframes, it would do the
> job with a smaller dataset. It doesn't work with
> Spatial*DataFrames (yet?).

I would be wary of this with coords as floating point, because they ought
to be snapped together. I believe that the original data were from a
regular grid with missing cells. If that is the case, and the coordinates
can be mapped to integer row and column IDs, then certainly your route
will work. You are right that there is as yet no cbind/rbind/merge 
facility for Spatial*DataFrames.

Roger

> 
> Mikkel
> 
> --- Roger Bivand <Roger.Bivand at nhh.no> wrote:
> 
> > On Fri, 8 Sep 2006, Michael Sumner wrote:
> > 
> > > Hello, I can think of a couple of simple-minded
> > approaches that would 
> > > take some time - either relying on direct
> > string-matching for the unique 
> > > coordinates, or by some contrived overlay.
> > > 
> > > However, there's probably far better approaches -
> > a couple of questions:
> > > 
> > > Can you predefine the set of all unique
> > coordinates without reading all 
> > > the tables from file? 
> > >  - if so you might simplify the identification of
> > each individual 
> > > coordinate, for matching the records
> > > 
> > > Are the coordinates (intended to be) on a regular
> > grid?  (This seems 
> > > unlikely, although it is nearly true given your X
> > coordinates).
> > 
> > The key question is what the data are. To me they
> > look like a global 
> > regular grid with some slippage in the print() - the
> > underlying diff() of 
> > the unique x's and y's is almost certainly regular.
> > I'm not sure why they 
> > are in text files either (model output?). But some
> > bits of the grid may be 
> > missing, the question being whether this is regular.
> > If as an earlier 
> > response indicated different data sets have
> > different grid cells 
> > missing, then we need the overall grid to start
> > with, then grab the row 
> > and column indices (and/or grid index), and attach
> > these to the data rows. 
> > 
> > If the solution needs to be robust, and have a
> > longer term utility, I 
> > would go for using MySQL, Terralib, and aRT. The
> > data representation is 
> > that of the Terralib Cell object, so the question
> > would be how to upload 
> > to the database from the text files.
> > 
> > aRT is at:
> > 
> > http://www.est.ufpr.br/aRT/
> > 
> > By the way, 1M by 100 by 8 bytes is pushing 32-bit R
> > - but handing off a 
> > lot of the data storage to a database relieves this
> > greatly.
> > 
> > Roger
> > 
> > > 
> > > Cheers, Mike.
> > > 
> > > 
> > > isidora k wrote:
> > > > Hi everyone!
> > > > I have 100 tables of the form:
> > > > XCOORD,YCOORD,OBSERVATION
> > > > 27.47500,42.52641,177
> > > > 27.48788,42.52641,177
> > > > 27.50075,42.52641,179
> > > > 27.51362,42.52641,178
> > > > 27.52650,42.52641,180
> > > > 27.53937,42.52641,178
> > > > 27.55225,42.52641,181
> > > > 27.56512,42.52641,177
> > > > 27.57800,42.52641,181
> > > > 27.59087,42.52641,181
> > > > 27.60375,42.52641,180
> > > > 27.61662,42.52641,181
> > > > ..., ..., ...
> > > > with approximately 1000000 observations for
> > each. All
> > > > these tables have the same xcoord and ycoord and
> > I
> > > > would like to get a table of the form
> > > > XCOORD,YCOORD,OBSERVATION1,OBSERVATION2,... 
> > > > 27.47500,42.52641,177,233,...
> > > > 27.48788,42.52641,177,345,...
> > > > 27.50075,42.52641,179,233,...
> > > > 27.51362,42.52641,178,123,...
> > > > 27.52650,42.52641,180,178,...
> > > > 27.53937,42.52641,178,...,...
> > > > 27.55225,42.52641,181,...
> > > > 27.56512,42.52641,177,...
> > > > 27.57800,42.52641,181,...
> > > > 27.59087,42.52641,181,...
> > > > 27.60375,42.52641,180,...
> > > > 27.61662,42.52641,181,...
> > > > In other words I would like to merge all the
> > tables
> > > > taking into account the common row names of
> > their
> > > > xcoords AND ycoords.
> > > > Not all tables have the same number of
> > observations
> > > > which means that not all pairs of x and y coords
> > > > match.
> > > > Is there a way to do this in R?
> > > > I would be grateful for any advice.
> > > > Many thanks
> > > > Isidora
> > > >
> > > > _______________________________________________
> > > > R-sig-Geo mailing list
> > > > R-sig-Geo at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > >
> > > >
> > > >
> > > 
> > > _______________________________________________
> > > R-sig-Geo mailing list
> > > R-sig-Geo at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > 
> > 
> > -- 
> > Roger Bivand
> > Economic Geography Section, Department of Economics,
> > Norwegian School of
> > Economics and Business Administration, Helleveien
> > 30, N-5045 Bergen,
> > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > e-mail: Roger.Bivand at nhh.no
> > 
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list