[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Wed Oct 30 06:31:57 CET 2019

Gabriel,
My view is rather radical.

- head/tail should return object having same number of dimensions
- data.frame should be a special case
- matrix should be handled as 2D array

P.S. idea of accepting `n` argument as a vector of corresponding
dimensions is a brilliant one

On Wed, Oct 30, 2019 at 1:13 AM Gabriel Becker <gabembecker using gmail.com> wrote:
>
> Hi all,
>
> So I've started working on this and I ran into something that I didn't
> know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
> ignore dimension completely, treat x as an atomic vector, and return an
> (unclassed) atomic vector:
>
> > x = array(100, c(4, 5, 5))
>
> > dim(x)
>
> [1] 4 5 5
>
> > head(x, 1)
>
> [1] 100
>
> > class(head(x))
>
> [1] "numeric"
>
>
> (For a 1d array, it does return another 1d array).
>
> When extending head/tail to understand multiple dimensions as discussed in
> this thread, then, should the behavior for 2+d arrays be explicitly
> retained, or should head and tail do the analogous thing (with a head(<2d
> array>) behaving the same as head(<matrix>), which honestly is what I
> expected to already be happening)?
>
> Are people using/relying on this behavior in their code, and if so, why/for
> what?
>
> Even more generally, one way forward is to have the default methods check
> for dimensions, and use length if it is null:
>
> tail.default <- tail.data.frame <- function(x, n = 6L, ...)
> {
>     if(any(n == 0))
>         stop("n must be non-zero or unspecified for all dimensions")
>     if(!is.null(dim(x)))
>         dimsx <- dim(x)
>     else
>         dimsx <- length(x)
>
>     ## this returns a list of vectors of indices in each
>     ## dimension, regardless of length of the the n
>     ## argument
>     sel <- lapply(seq_along(dimsx), function(i) {
>         dxi <- dimsx[i]
>         ## select all indices (full dim) if not specified
>         ni <- if(length(n) >= i) n[i] else dxi
>         ## handle negative ns
>         ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
>         seq.int(to = dxi, length.out = ni)
>     })
>     args <- c(list(x), sel, drop = FALSE)
>     do.call("[", args)
> }
>
>
> I think this precludes the need for a separate data.frame method at all,
> actually, though (I would think) tail.data.frame would still be defined and
> exported for backwards compatibility. (the matrix method has some extra
> bits so my current conception of it is still separate, though it might not
> NEED to be).
>
> The question then becomes, should head/tail always return something with
> the same dimensionally (number of dims) it got, or should data.frame and
> matrix be special cased in this regard, as they are now?
>
> What are people's thoughts?
> ~G
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel