[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Fri Nov 1 09:07:45 CET 2019

>>>>> Pages, Herve 
>>>>>     on Thu, 31 Oct 2019 21:02:07 +0000 writes:

    > On 10/30/19 04:29, Martin Maechler wrote:
    >>>>>>> Gabriel Becker
    >>>>>>> on Tue, 29 Oct 2019 12:43:15 -0700 writes:
    >> 
    >> > Hi all,
    >> > So I've started working on this and I ran into something that I didn't
    >> > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
    >> > ignore dimension completely, treat x as an atomic vector, and return an
    >> > (unclassed) atomic vector:
    >> 
    >> Well, that's  (3+), not "2+" .
    >> 
    >> But I did write (on Sep 17 in this thread!)
    >> 
    >> > The current source for head() and tail() and all their methods
    >> > in utils is just 83 lines of code  {file utils/R/head.R minus
    >> > the initial mostly copyright comments}.
    >> 
    >> and if've ever looked at these few dozen of R code lines, you'll
    >> have seen that we just added two simple utilities with a few
    >> reasonable simple methods.  To treat non-matrix (i.e. non-2d)
    >> arrays as vectors, is typically not unreasonable in R, but
    >> indeed with your proposals (in this thread), such non-2d arrays
    >> should be treated differently either via new  head.array() /
    >> tail.array() methods ((or -- only if it can be done more nicely -- by
    >> the default method)).
    >> 
    >> Note however the following  historical quirk :
    >> 
    >>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
    >> 1     2     3     4     5
    >> TRUE FALSE  TRUE  TRUE  TRUE
    >> 
    >> (Is this something we should consider changing for R 4.0.0 -- to
    >> have it TRUE also for 2d-arrays aka matrix objects ??)

    > That would be awesome! More generally I wonder how feasible it would be 
    > to fix all these inheritance quirks where inherits(x, "something"), 
    > is(x, "something"), and is.something(x) disagree. They've been such a 
    > nuisance for so many years...

    > Thanks,
    > H.

Thank you Hervé; you are right "in theory", but
no, we don't want to fix _all_ these quirks at the moment
(because we know how much this would break).
Note that ?class does mention S3 and S4, and also you know about
is(.,.)  which is more "rational" than inherits insofar as it
"thinks" the S4 way about inheritance .. but then it has it's
surprises, too; e.g., note the result of  is(NULL) .

I really wanted to address the relatively limited case of
{matrix, array} for now.

{{more on this in the subthread Peter opened}}
Martin

    >> The consequence of that is that
    >> currently, "often"   foo.matrix is just a copy of foo.array  in
    >> the case the latter exists:
    >> "base" examples: foo in {unique, duplicated, anyDuplicated}.
    >> 
    >> So I propose you change current  head.matrix and tail.matrix  to
    >> head.array and tail.array
    >> (and then have   head.matrix <- head.array  etc, at least if the
    >> above quirk must remain, or remains (which I currently guess to
    >> be the case)).
    >> 
    >> 
    >> >> x = array(100, c(4, 5, 5))
    >> 
    >> >> dim(x)
    >> 
    >> > [1] 4 5 5
    >> 
    >> >> head(x, 1)
    >> 
    >> > [1] 100
    >> 
    >> >> class(head(x))
    >> 
    >> > [1] "numeric"
    >> 
    >> 
    >> > (For a 1d array, it does return another 1d array).
    >> 
    >> > When extending head/tail to understand multiple dimensions as discussed in
    >> > this thread, then, should the behavior for 2+d arrays be explicitly
    >> > retained, or should head and tail do the analogous thing (with a head(<2d
    array> ) behaving the same as head(<matrix>), which honestly is what I
    >> > expected to already be happening)?
    >> 
    >> > Are people using/relying on this behavior in their code, and if so, why/for
    >> > what?
    >> 
    >> > Even more generally, one way forward is to have the default methods check
    >> > for dimensions, and use length if it is null:
    >> 
    >> > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
    >> > {
    >> > if(any(n == 0))
    >> > stop("n must be non-zero or unspecified for all dimensions")
    >> > if(!is.null(dim(x)))
    >> > dimsx <- dim(x)
    >> > else
    >> > dimsx <- length(x)
    >> 
    >> > ## this returns a list of vectors of indices in each
    >> > ## dimension, regardless of length of the the n
    >> > ## argument
    >> > sel <- lapply(seq_along(dimsx), function(i) {
    >> > dxi <- dimsx[i]
    >> > ## select all indices (full dim) if not specified
    >> > ni <- if(length(n) >= i) n[i] else dxi
    >> > ## handle negative ns
    >> > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
    >> > seq.int(to = dxi, length.out = ni)
    >> > })
    >> > args <- c(list(x), sel, drop = FALSE)
    >> > do.call("[", args)
    >> > }
    >> 
    >> 
    >> > I think this precludes the need for a separate data.frame method at all,
    >> > actually, though (I would think) tail.data.frame would still be defined and
    >> > exported for backwards compatibility. (the matrix method has some extra
    >> > bits so my current conception of it is still separate, though it might not
    >> > NEED to be).
    >> 
    >> > The question then becomes, should head/tail always return something with
    >> > the same dimensionally (number of dims) it got, or should data.frame and
    >> > matrix be special cased in this regard, as they are now?
    >> 
    >> > What are people's thoughts?
    >> > ~G
    >> 
    >> > [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-devel using r-project.org mailing list
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Xl_11U8w8hVRbuqAPQkz0uSW02kokK9EUPhOopxw0d8&s=vyKU4VkWLb_fGG6KeDPPjVM5_nLhav6UiX7NkzgqsuE&e=
    >> 

    > -- 
    > Hervé Pagès

    > Program in Computational Biology
    > Division of Public Health Sciences
    > Fred Hutchinson Cancer Research Center
    > 1100 Fairview Ave. N, M1-B514
    > P.O. Box 19024
    > Seattle, WA 98109-1024

    > E-mail: hpages using fredhutch.org
    > Phone:  (206) 667-5791
    > Fax:    (206) 667-1319