[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Oct 30 12:29:45 CET 2019
>>>>> Gabriel Becker
>>>>> on Tue, 29 Oct 2019 12:43:15 -0700 writes:
> Hi all,
> So I've started working on this and I ran into something that I didn't
> know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
> ignore dimension completely, treat x as an atomic vector, and return an
> (unclassed) atomic vector:
Well, that's (3+), not "2+" .
But I did write (on Sep 17 in this thread!)
> The current source for head() and tail() and all their methods
> in utils is just 83 lines of code {file utils/R/head.R minus
> the initial mostly copyright comments}.
and if've ever looked at these few dozen of R code lines, you'll
have seen that we just added two simple utilities with a few
reasonable simple methods. To treat non-matrix (i.e. non-2d)
arrays as vectors, is typically not unreasonable in R, but
indeed with your proposals (in this thread), such non-2d arrays
should be treated differently either via new head.array() /
tail.array() methods ((or -- only if it can be done more nicely -- by
the default method)).
Note however the following historical quirk :
> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
1 2 3 4 5
TRUE FALSE TRUE TRUE TRUE
(Is this something we should consider changing for R 4.0.0 -- to
have it TRUE also for 2d-arrays aka matrix objects ??)
The consequence of that is that
currently, "often" foo.matrix is just a copy of foo.array in
the case the latter exists:
"base" examples: foo in {unique, duplicated, anyDuplicated}.
So I propose you change current head.matrix and tail.matrix to
head.array and tail.array
(and then have head.matrix <- head.array etc, at least if the
above quirk must remain, or remains (which I currently guess to
be the case)).
>> x = array(100, c(4, 5, 5))
>> dim(x)
> [1] 4 5 5
>> head(x, 1)
> [1] 100
>> class(head(x))
> [1] "numeric"
> (For a 1d array, it does return another 1d array).
> When extending head/tail to understand multiple dimensions as discussed in
> this thread, then, should the behavior for 2+d arrays be explicitly
> retained, or should head and tail do the analogous thing (with a head(<2d
array> ) behaving the same as head(<matrix>), which honestly is what I
> expected to already be happening)?
> Are people using/relying on this behavior in their code, and if so, why/for
> what?
> Even more generally, one way forward is to have the default methods check
> for dimensions, and use length if it is null:
> tail.default <- tail.data.frame <- function(x, n = 6L, ...)
> {
> if(any(n == 0))
> stop("n must be non-zero or unspecified for all dimensions")
> if(!is.null(dim(x)))
> dimsx <- dim(x)
> else
> dimsx <- length(x)
> ## this returns a list of vectors of indices in each
> ## dimension, regardless of length of the the n
> ## argument
> sel <- lapply(seq_along(dimsx), function(i) {
> dxi <- dimsx[i]
> ## select all indices (full dim) if not specified
> ni <- if(length(n) >= i) n[i] else dxi
> ## handle negative ns
> ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
> seq.int(to = dxi, length.out = ni)
> })
> args <- c(list(x), sel, drop = FALSE)
> do.call("[", args)
> }
> I think this precludes the need for a separate data.frame method at all,
> actually, though (I would think) tail.data.frame would still be defined and
> exported for backwards compatibility. (the matrix method has some extra
> bits so my current conception of it is still separate, though it might not
> NEED to be).
> The question then becomes, should head/tail always return something with
> the same dimensionally (number of dims) it got, or should data.frame and
> matrix be special cased in this regard, as they are now?
> What are people's thoughts?
> ~G
> [[alternative HTML version deleted]]
More information about the R-devel
mailing list