[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Wed Oct 30 12:29:45 CET 2019

>>>>> Gabriel Becker 
>>>>>     on Tue, 29 Oct 2019 12:43:15 -0700 writes:

    > Hi all,
    > So I've started working on this and I ran into something that I didn't
    > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
    > ignore dimension completely, treat x as an atomic vector, and return an
    > (unclassed) atomic vector:

Well, that's  (3+), not "2+" .

But I did write (on Sep 17 in this thread!)

  > The current source for head() and tail() and all their methods
  > in utils is just 83 lines of code  {file utils/R/head.R minus
  > the initial mostly copyright comments}.

and if've ever looked at these few dozen of R code lines, you'll
have seen that we just added two simple utilities with a few
reasonable simple methods.  To treat non-matrix (i.e. non-2d)
arrays as vectors, is typically not unreasonable in R, but
indeed with your proposals (in this thread), such non-2d arrays
should be treated differently either via new  head.array() /
tail.array() methods ((or -- only if it can be done more nicely -- by
the default method)).

Note however the following  historical quirk :

> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
    1     2     3     4     5 
 TRUE FALSE  TRUE  TRUE  TRUE 

(Is this something we should consider changing for R 4.0.0 -- to
 have it TRUE also for 2d-arrays aka matrix objects ??)

The consequence of that is that
currently, "often"   foo.matrix is just a copy of foo.array  in
the case the latter exists:
"base" examples: foo in {unique, duplicated, anyDuplicated}.

So I propose you change current  head.matrix and tail.matrix  to
head.array and tail.array
(and then have   head.matrix <- head.array  etc, at least if the
 above quirk must remain, or remains (which I currently guess to
 be the case)).

    >> x = array(100, c(4, 5, 5))

    >> dim(x)

    > [1] 4 5 5

    >> head(x, 1)

    > [1] 100

    >> class(head(x))

    > [1] "numeric"

    > (For a 1d array, it does return another 1d array).

    > When extending head/tail to understand multiple dimensions as discussed in
    > this thread, then, should the behavior for 2+d arrays be explicitly
    > retained, or should head and tail do the analogous thing (with a head(<2d
    array> ) behaving the same as head(<matrix>), which honestly is what I
    > expected to already be happening)?

    > Are people using/relying on this behavior in their code, and if so, why/for
    > what?

    > Even more generally, one way forward is to have the default methods check
    > for dimensions, and use length if it is null:

    > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
    > {
    > if(any(n == 0))
    > stop("n must be non-zero or unspecified for all dimensions")
    > if(!is.null(dim(x)))
    > dimsx <- dim(x)
    > else
    > dimsx <- length(x)

    > ## this returns a list of vectors of indices in each
    > ## dimension, regardless of length of the the n
    > ## argument
    > sel <- lapply(seq_along(dimsx), function(i) {
    > dxi <- dimsx[i]
    > ## select all indices (full dim) if not specified
    > ni <- if(length(n) >= i) n[i] else dxi
    > ## handle negative ns
    > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
    > seq.int(to = dxi, length.out = ni)
    > })
    > args <- c(list(x), sel, drop = FALSE)
    > do.call("[", args)
    > }

    > I think this precludes the need for a separate data.frame method at all,
    > actually, though (I would think) tail.data.frame would still be defined and
    > exported for backwards compatibility. (the matrix method has some extra
    > bits so my current conception of it is still separate, though it might not
    > NEED to be).

    > The question then becomes, should head/tail always return something with
    > the same dimensionally (number of dims) it got, or should data.frame and
    > matrix be special cased in this regard, as they are now?

    > What are people's thoughts?
    > ~G

    > [[alternative HTML version deleted]]