[Rd] sapply improvements

Duncan Murdoch murdoch at stats.uwo.ca
Thu Nov 5 16:06:55 CET 2009


On 11/5/2009 4:05 AM, Martin Maechler wrote:
>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>     on Thu, 05 Nov 2009 00:28:51 +0100 writes:
> 
>     PD> William Dunlap wrote: ...
>     >>> 
>     >>> if (x <= 0) NA else log(x)
>     >>> 
>     >>> variety otherwise.
>     >> 
>     >> Would you only want it to coerce upwards to FUN.VALUES's
>     >> type?  E.g., allow sapply(z, length,
>     >> FUN.VALUE=numeric(1)) to return a numeric vector but die
>     >> on sapply(z, function(zi)as.complex(zi[1]),
>     >> FUN.VALUE=numeric(1)) If the latter doesn't die should it
>     >> return a complex or a numeric vector?  (I'd say it needs
>     >> to be numeric, but I'd prefer that it died.)
> 
>     PD> I'd say that it should probably die on downwards
>     PD> coercion. Getting a double when an integer is expected,
>     PD> or complex instead of double as you indicate, is a
>     PD> likely user error. If not, then the user can always
>     PD> coerce explicitly inside FUN.
> 
> I agree with Peter: Do allow coercion downwards
> 
>     PD> Another issue is whether one would want to go beyond the
>     PD> base classes of S (logical, integer, double, complex,
>     PD> character). For other classes, there may be no notion of
>     PD> "up" and "down" in coercion. Then again, sapply was
>     PD> always limited to what unlist() will handle, so e.g.
> 
>     >> sapply(1:10,FUN=function(i)Sys.Date())
>     PD>   [1] 14553 14553 14553 14553 14553 14553 14553 14553
>     PD> 14553 14553
> 
>     PD> as opposed to
> 
>     >> structure(rep(14553,10), class="Date")
>     PD>   [1] "2009-11-05" "2009-11-05" "2009-11-05"
>     PD> "2009-11-05" "2009-11-05" [6] "2009-11-05" "2009-11-05"
>     PD> "2009-11-05" "2009-11-05" "2009-11-05"
> 
> Well, using    
>       as(<prelim_result>,  class(<prototype>) )
> 
> would be really nice here.... 
> but alas, we are still not allowed to use  as(.,.) in base
> code which I'd tend to call  a "design bug" nowadays..

Part of the difficulty here is that we have too many concepts of "class" 
and "type" in R.  For example, as() is not consistent with as.vector() 
in the following sense:

If neither input is an S4 object, we should have

as(<prelim_result>,  class(<prototype>) )

be the same as

as.vector(<prelim_result>, typeof(<prototype>))

and

as.vector(<prelim_result>, class(<prototype>))

and currently as() gives a different result.  For example,

 > str(as(1:10, class(double(1))))
  int [1:10] 1 2 3 4 5 6 7 8 9 10
 > str(as.vector(1:10, typeof(double(1))))
  num [1:10] 1 2 3 4 5 6 7 8 9 10
 > str(as.vector(1:10, class(double(1))))
  num [1:10] 1 2 3 4 5 6 7 8 9 10

So if the coercion were to support as(), we'd need to decide when to 
follow its rules, and when to follow the existing as.vector() rules 
(which I think we're more or less following in the current sapply()).

We'd also need to handle the cases involving S4 objects:

I'd say if the prototype is not S4 but the result is, we should die with 
an error.

If the prototype is S4, then we should use as().  We have fast C code to 
detect S4 objects, do we have C code to do the coercion?  I'd rather not 
write it, but I wouldn't object if someone else did/already has.

Duncan Murdoch



More information about the R-devel mailing list