[R-sig-Geo] API documentation?

David Hugh-Jones dhughj at essex.ac.uk
Tue Jul 1 11:30:31 CEST 2008


I thought about this some more. One solution would be some wiki-like
documentation which people could easily edit as they learnt. The
obvious place is the R wiki ( http://wiki.r-project.org/rwiki/ ). At
the moment there's no section on spatial data and the spatial
statistics section just points you at the spatial view on CRAN.

Things I would like to know
- a list of external data types and how to get them into and out of R
- a list of internal R data representations (RarcInfo, spatstat etc.)
and how to convert between them
- a list of things to do to data (subset, thin, measure distances,
graphing etc.) and what packages do them

I am sure other people have their own thoughts - e.g. I haven't even
mentioned data analysis or statistics yet - so I am just going to
start hacking at
http://wiki.r-project.org/rwiki/doku.php?id=tips:spatial-data .
Helpers would be welcome - people who know a lot can provide answers,
and if (like me) you know barely anything, then you'll know what
questions need answering.

David Hugh-Jones
PhD Candidate
Essex University Department of Government
http://davidhughjones.googlepages.com


2008/7/1 Roger Bivand <Roger.Bivand at nhh.no>:
> On Mon, 30 Jun 2008, tom sgouros wrote:
>
>>
>> Roger Bivand <Roger.Bivand at nhh.no> wrote:
>>
>>
>>>> I don't mean to rant, but believe me, I've spent plenty of time with
>>>> the documentation and it's really not helping.
>>>>
>>>> Partly this is a problem of R's doc format which treats package
>>>> documentation as an alphabetical list of functions - which gives me no
>>>> idea where to start.
>>
>> I would tend to second this.  I've been lurking on the list for a few
>> months, hoping to learn a bit, but so far without much success, since
>> the conversation and the documentation are so far above where I am and
>> what I need.  I am almost familiar with R, using it for time-series
>> statistics, and learned early on that the Dalgaard book was a better
>> intro than any of the real R documents.  It doesn't cover nearly
>> everything, but it seems to cover what I needed, so I use what is
>> probably a baby-level set of R functions, but it's adequate.  Without a
>> professor lurking over my shoulder to explain stuff, I am perpetually
>> slightly lost, because all the documentation assumes I know stuff that I
>> don't.
>>
>> I still use R because I know that with enough poking around it will
>> eventually provide a solution.  But if the alternatives were not very
>> expensive, I would have given up a while ago.
>>
>> I joined this group when I wanted to expand into making maps of
>> geographical economic data, and after a month of working on the problem,
>> I essentially had to give up for the time being.  I wish there were an
>> introduction that showed me how to use R with a GIS program, but to my
>> knowledge, there is not.  I did run across a GRASS book that claimed it
>> would help, but as I recall, it cost upward of US$100.
>
> I would be happy to link to such a guide from the Spatial Task View, for
> example on the R-Geo site. There are other nice resources, for instance
> Dylan Beaudette's site - one page is:
>
> http://casoilresource.lawr.ucdavis.edu/drupal/node/100
>
> Seen from the developer side, it is hard to know what users see as the most
> useful advice. The courses that have been provided - say like:
>
> http://www.bias-project.org.uk/ASDARcourse/
>
> are rather "developer-view", as indeed the forthcoming "useR" series book
> will be. By "developer-view", I mean attempting to provide information both
> for beginning users and trying to advance along the useR-developeR continuum
> where experience has shown that this may be advisable, even though neither
> desired nor immediately applauded.
>
> A typical immediate response to the courses has been that "all that class
> and coordinate reference system stuff is unnecessary". This seems to hold
> until the participants actually get to do work with their own data, at which
> point having a reference to what is going on is handy. The specific
> difficulty, as teachers often find, is that the initial expectation from the
> user is often not the most fruitful question for helping the user to become
> more self-reliant going forward.
>
> One clear reason for this difficulty is that many different disciplines use
> spatial data, and all of them seem to feel that they know enough for their
> internal purposes, so get frustrated when they encounter barriers which are
> inherent in their perception(s) of spatial data. So listening to other
> disciplines and learning from them can be helpful.
>
> As far as sp classes are concerned, is the R-News note of 2005 too outdated
> to be helpful? Should it be placed more prominiently on the Task View page?
> I would acknowledge that ease of use is not what it could be, we are still
> where time series (and time representation) were in R a couple of years ago.
> However, the sp classes ought to work for many who do not need to manipulate
> the actual coordinates. For working statisticians, simple mapping of model
> residuals is no more than:
>
> library(sp)
> library(rgdal)
> mydata <- readOGR(dsn="directory", layer="shpfile")
> # or:
> # library(maptools)
> # mydata <- readShapeSpatial("shpfile.shp")
> # from release 0.7-14 already submitted to CRAN, now you have to know
> # whether your shapefile is point, line or polygons
> myobj <- lm(response ~ x1 + x2, data=mydata)
> mydata$residuals <- residuals(myobj)
> spplot(mydata, "residuals")
>
> where the mydata Spatial*DataFrame behaves as a data.frame. The two-faced
> nature of the Spatial*DataFrame classes is intentional, looking like GIS
> data models for GIS people, and data frames for statisticians. But
> manipulating coordinates is just a good deal more complicated - unless you
> just need subsetting with the "[" methods.
>
> To summarise, contributions of user tips and examples, and links to those
> examples, would be very welcome.
>
> Roger
>
>>
>> To make this more useful than just a rant, I would second David's point.
>> What is missing is only what David misses: an introduction that says
>> where to start to deal with simple geographic data, maybe providing a
>> few examples of common techniques and frequent problems, and pushing
>> data back and forth to some GIS.  I was not able to find that, and
>> without it, found the R documentation pretty much useless.  I'd be happy
>> to know of some source I hadn't found before, so if you have one to
>> recommend, please do.
>>
>> -tom
>>
>>
>>
>>>
>>> This is an inherent (and perhaps ugly) characteristic of the S4
>>> object/class structure as you suggest below. New style classes are not
>>> as well integrated into the documentation as straight functions
>>> are. Here, coerce is as(), but the issue of how to improve
>>> documentation is not resolved.
>>>
>>>>
>>>> This then interacts badly with the OO structure. For example, look at
>>>> the 20+ pages on "coerce". Hmm, what does "coerce" actually do? In
>>>> fact that's in a whole different library. But I didn't know that, so I
>>>> click on a page at random, say
>>>>
>>>> coerce,SpatialGrid,data.frame-method
>>>>
>>>> and this takes me to SpatialGrid class - which doesn't mention coerce
>>>> at all. (Nor does it tell me what SpatialGrid is, or what it is used
>>>> for.)
>>>>
>>>> On the other hand, maybe I might guess that to get a list of
>>>> coordinates, I'd use "coordinates". So I click on that method, and it
>>>> tells me yes, this "retrieves spatial coordinates". But unfortunately
>>>> it retrieves them hidden inside another object ("an object of class
>>>> SpatialPointsDataFrame"). OK, but how do I get the _actual_data_?
>>>> Maybe the SpatialPointsDataFrame class page will tell me. Nope. Et
>>>> cetera.
>>>>
>>>> Rick: yes, I agree that using the internal data structures is how to
>>>> do things, but this is broken isn't it? The whole point of having OO
>>>> is to be able to use it _without_ understanding the internal data
>>>> structures. The ideal, in other words, would be to have a "thin.lines"
>>>> method that I could just run on any polygon or set of polygons.
>>>> Failing that, then I should be able to get at the internal data
>>>> without hours of head scratching.
>>>>
>>>
>>> No, because the underlying understanding of dp and other methods for
>>> thinning is that the objects implement an arc-node topological model,
>>> so that each arc can be thinned without different thinning happening
>>> on otherwise identical boundaries of neighbouring polygons. But we do
>>> not have an arc-node representation, so there cannot be line thinning
>>> for polygon boundaries in a spaghetti world.
>>>
>>>> Right now, it's like, everything is hidden behind a layer of classes
>>>> and slots and methods, but I still need to go behind that layer to get
>>>> at the actual raw data, and this is so complicated and confusing that
>>>> it would be easier just to work with the raw data.
>>>>
>>>
>>> You need to build topology first, so if need be take the data out to a
>>> GIS that does topology properly, do the arc line thinning there, and
>>> bring it back in. Building topology from a stream of straight line
>>> segments is a serious challenge, especially if you want to retain the
>>> association with attribute data.
>>>
>>> Roger
>>>
>>>> OK, I'll stop venting. If there's anything I could do to improve this
>>>> situation, I would gladly try.
>>>>
>>>> David Hugh-Jones
>>>> PhD Candidate
>>>> Essex University Department of Government
>>>> http://davidhughjones.googlepages.com
>>>>
>>>>
>>>> 2008/6/30 Virgilio Gomez-Rubio <v.gomezrubio at imperial.ac.uk>:
>>>>>
>>>>> Dear David,
>>>>>
>>>>> Probably the best way to start is by checking the HTML documentation.
>>>>> It
>>>>> should be installed locally but it is also accesible, for example,
>>>>> here:
>>>>>
>>>>> http://finzi.psych.upenn.edu/R/library/sp/html/00Index.html
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Virgilio
>>>>>
>>>>> On Mon, 2008-06-30 at 18:48 +0200, David Hugh-Jones wrote:
>>>>>>
>>>>>> Thanks David for his comment about dp.
>>>>>>
>>>>>> Quick question: is there any reasonably comprehensible API
>>>>>> documentation for the "sp" package? I have just spent about an hour
>>>>>> trying to get a list of points from a SpatialPolygons object. I
>>>>>> eventually just printed everything out and found the data by hand, so
>>>>>> now I am doing:
>>>>>>
>>>>>> coords <- myobject at polygons[[1]]@Polygons[[1]]@coords
>>>>>>
>>>>>> but I don't assume that is right. Surely there must be some simple way
>>>>>> to get a list of x and y coords out of any object?
>>>>>>
>>>>>> in frustration,
>>>>>> David Hugh-Jones
>>>>>> PhD Candidate
>>>>>> Essex University Department of Government
>>>>>> http://davidhughjones.googlepages.com
>>>>>>
>>>>>>
>>>>>> 2008/6/30 David PINAUD <pinaud at cebc.cnrs.fr>:
>>>>>>>
>>>>>>> maybe you can try the function dp() in the package "shapefiles",
>>>>>>> which is an
>>>>>>> implementation of the Douglas-Peucker polyLine simplification
>>>>>>> algorithm.
>>>>>>> Hope it helps
>>>>>>> David
>>>>>>>
>>>>>>> David Hugh-Jones a écrit :
>>>>>>>>
>>>>>>>> Hi all
>>>>>>>>
>>>>>>>> I have a big dataset of points and am doing stuff on them that takes
>>>>>>>> a
>>>>>>>> lot of time. To speed it up, I would like to use "thinlines" from
>>>>>>>> RArcinfo, which basically makes the maps "rougher" by throwing away
>>>>>>>> points. Is there an equivalent function for SpatialPolygon type
>>>>>>>> objects? (I assume that there's no way to convert _to_ Arcinfo,
>>>>>>>> though
>>>>>>>> I know it's possible to read from it).
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> David Hugh-Jones
>>>>>>>> PhD Candidate
>>>>>>>> Essex University Department of Government
>>>>>>>> http://davidhughjones.googlepages.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ***************************************************
>>>>>>> David PINAUD
>>>>>>> Ingénieur de Recherche "Analyses spatiales"
>>>>>>>
>>>>>>> Centre d'Etudes Biologiques de Chizé - CNRS UPR1934
>>>>>>> 79360 Villiers-en-Bois, France poste 485
>>>>>>> Tel: +33 (0)5.49.09.35.58
>>>>>>> Fax: +33 (0)5.49.09.65.26
>>>>>>> http://www.cebc.cnrs.fr/
>>>>>>>
>>>>>>> ***************************************************
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ________ Information from NOD32 ________
>>>>>>> This message was checked by NOD32 Antivirus System for Linux Mail
>>>>>>> Servers.
>>>>>>> http://www.eset.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>> --
>>> Roger Bivand
>>> Economic Geography Section, Department of Economics, Norwegian School of
>>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>>> e-mail: Roger.Bivand at nhh.no
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>




More information about the R-sig-Geo mailing list