[Rd] RFC: tapply(*, ..., init.value = NA)
Martin Maechler
maechler at stat.math.ethz.ch
Sat Feb 4 16:48:08 CET 2017
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Wed, 1 Feb 2017 16:17:06 +0000 writes:
> On 'aggregate data.frame', the URL should be
> https://stat.ethz.ch/pipermail/r-help/2016-May/438631.html .
thank you. Yes, using 'drop' makes sense there where the result
is always "linear(ized)" or "one-dimensional".
For tapply() that's only the case for 1D-index.
> vector(typeof(ans)) (or vector(storage.mode(ans))) has
> length zero and can be used to initialize array.
Yes,.. unless in the case where ans is NULL.
You have convinced me, that is nicer.
> Instead of if(missing(default)) , if(identical(default,
> NA)) could be used. The documentation could then say, for
> example: "If default = NA (the default), NA of appropriate
> storage mode (0 for raw) is automatically used."
After some thought (and experiments), I have reverted and no
longer use if(missing). You are right that it is not needed
(and even potentially confusing) here.
Changes are in svn c72106.
Martin Maechler
> --------------------------------------------
> On Wed, 1/2/17, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
> Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)
> Cc: R-devel at r-project.org Date: Wednesday, 1 February,
> 2017, 12:14 AM
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Tue, 31 Jan 2017 15:43:53 +0000 writes:
>> Function 'aggregate.data.frame' in R has taken a
>> different route. With drop=FALSE, the function is also
>> applied to subset corresponding to combination of
>> grouping variables that doesn't appear in the data
>> (example 2 in
>> https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).
> Interesting point (I couldn't easily find 'the example 2'
> though). However, aggregate.data.frame() is a
> considerably more sophisticated function and one goal was
> to change tapply() as little as possible for compatibility
> (and maintenance!) reasons .
> [snip]
>> With the code using if(missing(default)) , I consider the
>> stated default value of 'default', default = NA ,
>> misleading because the code doesn't use it.
> I know and I also had thought about it and decided to keep
> it in the spirit of "self documentation" because "in
> spirit", the default still *is* NA.
>> Also, tapply(1:3, 1:3, as.raw) is not the same as
>> tapply(1:3, 1:3, as.raw, default = NA) . The accurate
>> statement is the code in if(missing(default)) , but it
>> involves the local variable 'ans'.
> exactly. But putting that whole expression in there would
> look confusing to those using str(tapply), args(tapply) or
> similar inspection to quickly get a glimpse of the
> function user "interface". That's why we typically don't
> do that and rather slightly cheat with the formal default,
> for the above "didactical" purposes.
> If you are puristic about this, then missing() should
> almost never be used when the function argument has a
> formal default.
> I don't have a too strong opinion here, and we do have
> quite a few other cases, where the formal default argument
> is not always used because of if(missing(.)) clauses.
> I think I could be convinced to drop the '= NA' from the
> formal argument list..
>> As far as I know, the result of function 'array' in is
>> not a classed object and the default method of `[<-` will
>> be used in the 'tapply' code portion.
>> As far as I know, the result of 'lapply' is a list
>> without class. So, 'unlist' applied to it uses the
>> default method and the 'unlist' result is a vector or a
>> factor.
> You may be right here ((or not: If a package author makes
> array() into an S3 generic and defines S3method(array, *)
> and she or another make tapply() into a generic with
> methods, are we really sure that this code would not be
> used ??))
> still, the as.raw example did not easily work without a
> warning when using as.vector() .. or similar.
>> With the change, the result of
>> tapply(1:3, 1:3, factor, levels=3:1)
>> is of mode "character". The value is from the internal
>> code, not from the factor levels. It is worse than before
>> the change, where it is really the internal code,
>> integer.
> I agree that this change is not desirable. One could
> argue that it was quite a "lucky coincidence" that the
> previous code returned the internal integer codes though..
> [snip]
>> To initialize array, a zero-length vector can also be
>> used.
> yes, of course; but my ans[0L][1L] had the purpose to get
> the correct mode specific version of NA .. which works for
> raw (by getting '00' because "raw" has *no* NA!).
> So it seems I need an additional !is.factor(ans) there ...
> a bit ugly.
> ---------
> [snip]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list