[R-sig-Geo] Better print method for Spatial*DataFrames?

Sun May 30 01:34:40 CEST 2010

On 05/29/2010 11:47 AM, Roger Bivand wrote:

> On Sat, 29 May 2010, Barry Rowlingson wrote:
> 
>> On Sat, May 29, 2010 at 10:06 AM, Roger Bivand <Roger.Bivand at nhh.no>
>> wrote:
>>
>>
>>> In other software systems (octave, Stata, ...), one can turn on and
>>> off a
>>> more/less screen-by-screen displayer (not scrolling upwards, just
>>> chunking),
>>> but I'm not aware of an equivalent in R/S. I'm not sure how head() and
>>> tail() work in R,
>>
>> They don't seem to work very well at all for Spatial*DataFrames. If I
>> add coordinates to meuse to get a SpatialPointsDataFrame and
>> head(that) I get all the 'rows' but with only the cadmium
>> measurements. It's slicing it the wrong way. Odd.
>>
>>> and personally use str() by default. If I need to access
>>> the coordinates of a particular line or polygon, I print() just that
>>> list
>>> element (Line or Polygon object).
>>>
>>> I can see what you mean, but feel that users will benefit much more
>>> by using
>>> str(), which is a real gem!
>>
>> str is great if you need to know the str-ucture of an R object. But
>> it doesn't even align the values so you can see across rows of your
>> data, which is what I'd like print to do (by analogy with
>> print.data.frame).
>>
>> Currently if I print a SpatialPolygonsDataFrame I get the structure.
>> Print methods should do better than that - you're almost suggesting
>> not having, for example, a print method for data frames and that we'd
>> be better off having what print.default(anyDataFrame) gives us.
>>
>> So my proposal is that print of a SpatialPolygonsDataFrame class
>> should print like a data frame but with some indicator of the geometry
>> at the start of the row, such as POLYGON(...) - literally with dots,
>> there's no need to spell it out. Similarly for Lines.
>>
>> Another suggestion is for head() and tail methods on
>> Spatial*DataFrame objects - I think just subscripting [1:n,] from the
>> object and returning would do it. I think currently head and tail
>> treat these objects as lists and the results are not pretty.
> 
> Right, because they see S4 objects as lists with no components, only
> with attributes. str() does have support for S4 objects. They would need
> to be wrapped around an S4 show/print method, with the output captured,
> as in capture.output(). Would it make sense to have the default
> print/show for Spatial* be str() with max.level= set, and for
> Spatial*DataFrame be the print method for the data slot prepended with
> some text (perhaps POINT, MULTILINESTRING, MULTIPOLYGON, PIXEL, CELL, or
> better an abbreviation)?

In the following example:

require(maptools)
nc = readShapePoly(system.file("shapes/sids.shp", package =
"maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
+ellps=clrk66"))
str(as(nc, "SpatialPolygons"))
as(nc, "SpatialPolygons")

I personally find the output of the (current) print method producing
much easier readable than that of str. Partly because I've grown
accustomed to it, but also partly because I have never liked the output
of str. I tend to use the current default show method used for
SpatialLines* and SpatialPolygons* (the generic show for S4 objects) to
figure out what the structure of the data is, not how to use it. So I
guess for those who want to use the data without bothering about the
deeper structure, these print methods (both: current show.S4 and str)
are not so useful. If you disagree with this: please respond!

As for Barry's proposal, I find it a bit repetitive (and space
consuming) to have a POINT(1 1) instead of the current (1,1) (which,
credits where credits go, is from a package Barry wrote that preceded
sp). I can very well understand that many people will not know how to
read WKT [1], as it again is something that programmers tend to find
useful, not users; to be right we need the awfully long words
MULTILINESTRING and MULTIPOLYGON to represent the sp classes, and then
can't write the whole string but need to abbreviate. I agree with Barry
that a representation as much as possible like a data.frame is most useful.

I suggest the folloging: for points:

  geometry attr1 attr2 attr3
PT(234 45)   333   xxy  22.5
PT(455 68)   221   xxx  13.2

for polygons: use PN(3;2335) to express that this MULTIPOLYGON consists
of 3 POLYGONS, and has 2335 coordinates (in total)

  geometry attr1 attr2 attr3
PN(3;2335)   333   xxy  22.5
PN(45;345)   221   xxx  13.2

for lines:

  geometry attr1 attr2 attr3
LI(3;2335)   333   xxy  22.5
LI (5;345)   221   xxx  13.2

for pixels: use points, replace PT with PX

for grids: don't print all the values, but a very short summary.

To really educate users that we "glue" data.frame attribute tables to
geometries, they need to see this, and therefore I want to print a
SpatialPoints object as:

  geometry
PT(234 45)
PT(455 68)

and do the same for SpatialLines and SpatialPolygons:

  geometry
PN(3;2335)
PN(45;345)

  geometry
LI(3;2335)
LI (5;345)

what head and tail should do is then obvious.

Next thing is that developers/programmers need to find out how to print
all the gory details -- they will need to use str(nc) or show(unclass(nc)).

For those from Europe: thank you for all the points in the song contest.
We also like Lena a lot, here at home.

[1] http://en.wikipedia.org/wiki/Well-known_text
-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de
http://www.52north.org/geostatistics      e.pebesma at wwu.de