[Rd] RFC: tapply(*, ..., init.value = NA)

Sat Feb 4 16:48:08 CET 2017

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>>     on Wed, 1 Feb 2017 16:17:06 +0000 writes:

    > On 'aggregate data.frame', the URL should be
    > https://stat.ethz.ch/pipermail/r-help/2016-May/438631.html .

thank you. Yes, using 'drop' makes sense there where the result
is always "linear(ized)" or "one-dimensional".
For tapply() that's only the case for 1D-index.

    > vector(typeof(ans)) (or vector(storage.mode(ans))) has
    > length zero and can be used to initialize array.  

Yes,.. unless in the case where ans is NULL.
You have convinced me, that is  nicer.

    > Instead of if(missing(default)) , if(identical(default,
    > NA)) could be used. The documentation could then say, for
    > example: "If default = NA (the default), NA of appropriate
    > storage mode (0 for raw) is automatically used."

After some thought (and experiments), I have reverted and no
longer use if(missing). You are right that it is not needed
(and even potentially confusing) here.

Changes are in svn c72106.

Martin Maechler

    > --------------------------------------------
    > On Wed, 1/2/17, Martin Maechler
    > <maechler at stat.math.ethz.ch> wrote:

    >  Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)

    >  Cc: R-devel at r-project.org Date: Wednesday, 1 February,
    > 2017, 12:14 AM

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>>     on Tue, 31 Jan 2017 15:43:53 +0000 writes:

    >> Function 'aggregate.data.frame' in R has taken a
    >> different route. With drop=FALSE, the function is also
    >> applied to subset corresponding to combination of
    >> grouping variables that doesn't appear in the data
    >> (example 2 in
    >> https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

    > Interesting point (I couldn't easily find 'the example 2'
    > though).  However, aggregate.data.frame() is a
    > considerably more sophisticated function and one goal was
    > to change tapply() as little as possible for compatibility
    > (and maintenance!) reasons .

    > [snip]

    >> With the code using if(missing(default)) , I consider the
    >> stated default value of 'default', default = NA ,
    >> misleading because the code doesn't use it.

    > I know and I also had thought about it and decided to keep
    > it in the spirit of "self documentation" because "in
    > spirit", the default still *is* NA.

    >> Also, tapply(1:3, 1:3, as.raw) is not the same as
    >> tapply(1:3, 1:3, as.raw, default = NA) .  The accurate
    >> statement is the code in if(missing(default)) , but it
    >> involves the local variable 'ans'.

    > exactly.  But putting that whole expression in there would
    > look confusing to those using str(tapply), args(tapply) or
    > similar inspection to quickly get a glimpse of the
    > function user "interface".  That's why we typically don't
    > do that and rather slightly cheat with the formal default,
    > for the above "didactical" purposes.

    > If you are puristic about this, then missing() should
    > almost never be used when the function argument has a
    > formal default.

    > I don't have a too strong opinion here, and we do have
    > quite a few other cases, where the formal default argument
    > is not always used because of if(missing(.))  clauses.

    > I think I could be convinced to drop the '= NA' from the
    > formal argument list..

    >> As far as I know, the result of function 'array' in is
    >> not a classed object and the default method of `[<-` will
    >> be used in the 'tapply' code portion.

    >> As far as I know, the result of 'lapply' is a list
    >> without class. So, 'unlist' applied to it uses the
    >> default method and the 'unlist' result is a vector or a
    >> factor.

    > You may be right here ((or not: If a package author makes
    > array() into an S3 generic and defines S3method(array, *)
    > and she or another make tapply() into a generic with
    > methods, are we really sure that this code would not be
    > used ??))

    > still, the as.raw example did not easily work without a
    > warning when using as.vector() .. or similar.

    >> With the change, the result of

    >> tapply(1:3, 1:3, factor, levels=3:1)

    >> is of mode "character". The value is from the internal
    >> code, not from the factor levels. It is worse than before
    >> the change, where it is really the internal code,
    >> integer.

    > I agree that this change is not desirable.  One could
    > argue that it was quite a "lucky coincidence" that the
    > previous code returned the internal integer codes though..

    > [snip]

    >> To initialize array, a zero-length vector can also be
    >> used.

    > yes, of course; but my ans[0L][1L] had the purpose to get
    > the correct mode specific version of NA .. which works for
    > raw (by getting '00' because "raw" has *no* NA!).

    > So it seems I need an additional !is.factor(ans) there ...
    > a bit ugly.

    > ---------

    > [snip]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel