[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Thu Oct 31 22:02:07 CET 2019

On 10/30/19 04:29, Martin Maechler wrote:
>>>>>> Gabriel Becker
>>>>>>      on Tue, 29 Oct 2019 12:43:15 -0700 writes:
> 
>      > Hi all,
>      > So I've started working on this and I ran into something that I didn't
>      > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
>      > ignore dimension completely, treat x as an atomic vector, and return an
>      > (unclassed) atomic vector:
> 
> Well, that's  (3+), not "2+" .
> 
> But I did write (on Sep 17 in this thread!)
> 
>    > The current source for head() and tail() and all their methods
>    > in utils is just 83 lines of code  {file utils/R/head.R minus
>    > the initial mostly copyright comments}.
> 
> and if've ever looked at these few dozen of R code lines, you'll
> have seen that we just added two simple utilities with a few
> reasonable simple methods.  To treat non-matrix (i.e. non-2d)
> arrays as vectors, is typically not unreasonable in R, but
> indeed with your proposals (in this thread), such non-2d arrays
> should be treated differently either via new  head.array() /
> tail.array() methods ((or -- only if it can be done more nicely -- by
> the default method)).
> 
> Note however the following  historical quirk :
> 
>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
>      1     2     3     4     5
>   TRUE FALSE  TRUE  TRUE  TRUE
> 
> (Is this something we should consider changing for R 4.0.0 -- to
>   have it TRUE also for 2d-arrays aka matrix objects ??)

That would be awesome! More generally I wonder how feasible it would be 
to fix all these inheritance quirks where inherits(x, "something"), 
is(x, "something"), and is.something(x) disagree. They've been such a 
nuisance for so many years...

Thanks,
H.

> 
> The consequence of that is that
> currently, "often"   foo.matrix is just a copy of foo.array  in
> the case the latter exists:
> "base" examples: foo in {unique, duplicated, anyDuplicated}.
> 
> So I propose you change current  head.matrix and tail.matrix  to
> head.array and tail.array
> (and then have   head.matrix <- head.array  etc, at least if the
>   above quirk must remain, or remains (which I currently guess to
>   be the case)).
> 
> 
>      >> x = array(100, c(4, 5, 5))
> 
>      >> dim(x)
> 
>      > [1] 4 5 5
> 
>      >> head(x, 1)
> 
>      > [1] 100
> 
>      >> class(head(x))
> 
>      > [1] "numeric"
> 
> 
>      > (For a 1d array, it does return another 1d array).
> 
>      > When extending head/tail to understand multiple dimensions as discussed in
>      > this thread, then, should the behavior for 2+d arrays be explicitly
>      > retained, or should head and tail do the analogous thing (with a head(<2d
>      array> ) behaving the same as head(<matrix>), which honestly is what I
>      > expected to already be happening)?
> 
>      > Are people using/relying on this behavior in their code, and if so, why/for
>      > what?
> 
>      > Even more generally, one way forward is to have the default methods check
>      > for dimensions, and use length if it is null:
> 
>      > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
>      > {
>      > if(any(n == 0))
>      > stop("n must be non-zero or unspecified for all dimensions")
>      > if(!is.null(dim(x)))
>      > dimsx <- dim(x)
>      > else
>      > dimsx <- length(x)
> 
>      > ## this returns a list of vectors of indices in each
>      > ## dimension, regardless of length of the the n
>      > ## argument
>      > sel <- lapply(seq_along(dimsx), function(i) {
>      > dxi <- dimsx[i]
>      > ## select all indices (full dim) if not specified
>      > ni <- if(length(n) >= i) n[i] else dxi
>      > ## handle negative ns
>      > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
>      > seq.int(to = dxi, length.out = ni)
>      > })
>      > args <- c(list(x), sel, drop = FALSE)
>      > do.call("[", args)
>      > }
> 
> 
>      > I think this precludes the need for a separate data.frame method at all,
>      > actually, though (I would think) tail.data.frame would still be defined and
>      > exported for backwards compatibility. (the matrix method has some extra
>      > bits so my current conception of it is still separate, though it might not
>      > NEED to be).
> 
>      > The question then becomes, should head/tail always return something with
>      > the same dimensionally (number of dims) it got, or should data.frame and
>      > matrix be special cased in this regard, as they are now?
> 
>      > What are people's thoughts?
>      > ~G
> 
>      > [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Xl_11U8w8hVRbuqAPQkz0uSW02kokK9EUPhOopxw0d8&s=vyKU4VkWLb_fGG6KeDPPjVM5_nLhav6UiX7NkzgqsuE&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319