[R-sig-Geo] Combining polygons and calculating their area (i.e. number of cells)

Edzer Pebesma edzer.pebesma at uni-muenster.de
Fri Dec 20 20:06:09 CET 2013



On 12/20/2013 06:20 PM, Josh O'Brien wrote:
> On Fri, Dec 20, 2013 at 5:38 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> 
>> On Thu, 19 Dec 2013, Josh O'Brien wrote:
>>
>>  On Thu, Dec 19, 2013 at 4:23 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>> wrote:
>>>
>>>  On Wed, 18 Dec 2013, Josh O'Brien wrote:
>>>>
>>>>
>>> <...snip...>
>>>
>>> By the way, always avoid accessing S4 objects directly using @, do use
>>>
>>>> slot(obj, "slotname") - the sapply should read:
>>>>
>>>> area = sapply(slot(SPclus, "polygons"), slot, "area")
>>>>
>>>> for the SO version with possibly incorrect areas, and
>>>>
>>>> area = gArea(SPclus, byid=TRUE)
>>>>
>>>> for correct ones.
>>>>
>>>
>>>
>>> Would you mind explaining why the functional form, slot(obj, "slotname"),
>>> should always be used instead of obj at slotname ? I've seen this admonition
>>> repeatedly -- I think just from you -- and don't know whether it's a
>>> purely
>>> stylistic preference on your part, or whether there  is some other
>>> rationale for preferring that form.
>>>
>>
>> When sp was written (2003-5), we chose to use S4 (new style) classes. We
>> used Chambers (1998), referring to Ch. 7, and on slots pp. 290-292. There
>> the distinction between S3 (old style) "$" and "$<-" access and replacement
>> methods, and S4 "@" and particularly "@<-" was made more forcefully than in
>> Chambers (2008). Contemporary uses described in Venables and Ripley (2000)
>> also distinguish between the two.
>>
>> All of these point to the formal use of S4 class definitions, not least to
>> ensure that storage mode checking when using .C() and .Call() cease to be
>> so time-consuming. This is an issue with S3 classes, because there is
>> nothing to stop the user modifying the storage mode of list components,
>> with potentially bad consequences in compiled code. Defensive changes in
>> the underlying R engine to detect mode mismatch were introduced very much
>> later, I believe, so mode mismatch could crash the engine until them.
>>
>> For both S3 and S4 classes, the user is encouraged to use access functions
>> where provided. If the classes and methods are sufficiently well written,
>> there should only be a few occasions in which the user might want to access
>> components (S3) or slots (S4) that are not exposed via methods. If scripts
>> consistently contain @, and no access or replacement methods are provided,
>> consider asking the package maintainer to add the missing functionality.
>> slot() is a little less ugly, but the user shouldn't really need it either,
>> unless something inside an object has to be shown or manipulated.
>>
>> In this case, the "area" slot is documented, but precisely because it is
>> not intended to be used as a measure of area, there is no access method.
>>
>> The danger is that "@<-" and "$<-" are used to insert values into
>> components/slots without sufficient care being taken; access is perhaps
>> less of a problem.
>>
>> I particularly react to usages such as:
>>
>> sdf at data$var
>>
>> for sdf a Spatial*DataFrame object, as "$" and "$<-" methods *are*
>> provided to let these objects appear to be data.frame objects. This usage
>> is redundant, and displays ignorance about the class/method systems in S
>> and R. Of course, all are free to write what they like, so my preferences
>> may be just a matter of taste, but at least they are based on the books
>> written to establish the structure of the language.
>>
>> Hope this clarifies,
>>
>> Roger
>>
> 
> None of that seems like just a matter of taste _except_ perhaps for the
> preference for slot(obj, "slotname") over obj at slotname (which, by the way,
> is used extensively in the sp package's code base).

I agree.

As Roger says, access methods allow developers to improve implementation
of a class, while keeping scripts to work. If a package user works
directly on the slots, [s]he takes the risk that a change in the class
later breaks the script. Package developers own the class
implementation, they need to use the slots to make methods work.

Access functions may also do more than you think:

> library(sp)
> data(meuse)
> coordinates(meuse) = ~x+y
# this removes x and y from @data slot, to avoid redundancy:
> object.size(meuse)
39880 bytes
> meuse at data$x[1:10] # of course, they're no longer here:
NULL
# but:
> meuse$x[1:10] # retrieves x from @coords slot
     1      2      3      4      5      6      7      8      9     10
181072 181025 181165 181298 181307 181390 181165 181027 181060 181232
# alternatively:
> data(meuse)
> coordinates(meuse) = meuse[c("x", "y")]
# x and y are now both in @data and @coords, so:
> object.size(meuse)
42528 bytes
# and we can do a
> meuse at data$x[1:10]
 [1] 181072 181025 181165 181298 181307 181390 181165 181027 181060 181232

But all that was not your initial question.

> 
> Thanks for your thoughtful and enlightening reply,
> 
> - Josh
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Heisenbergstraße 2, 48149 Münster, Germany. Phone: +49 251
83 33081 http://ifgi.uni-muenster.de GPG key ID 0xAC227795

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 555 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20131220/1fd57a05/attachment.bin>


More information about the R-sig-Geo mailing list