[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?
Pages, Herve
hp@ge@ @end|ng |rom |redhutch@org
Thu Oct 31 22:02:07 CET 2019
On 10/30/19 04:29, Martin Maechler wrote:
>>>>>> Gabriel Becker
>>>>>> on Tue, 29 Oct 2019 12:43:15 -0700 writes:
>
> > Hi all,
> > So I've started working on this and I ran into something that I didn't
> > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
> > ignore dimension completely, treat x as an atomic vector, and return an
> > (unclassed) atomic vector:
>
> Well, that's (3+), not "2+" .
>
> But I did write (on Sep 17 in this thread!)
>
> > The current source for head() and tail() and all their methods
> > in utils is just 83 lines of code {file utils/R/head.R minus
> > the initial mostly copyright comments}.
>
> and if've ever looked at these few dozen of R code lines, you'll
> have seen that we just added two simple utilities with a few
> reasonable simple methods. To treat non-matrix (i.e. non-2d)
> arrays as vectors, is typically not unreasonable in R, but
> indeed with your proposals (in this thread), such non-2d arrays
> should be treated differently either via new head.array() /
> tail.array() methods ((or -- only if it can be done more nicely -- by
> the default method)).
>
> Note however the following historical quirk :
>
>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
> 1 2 3 4 5
> TRUE FALSE TRUE TRUE TRUE
>
> (Is this something we should consider changing for R 4.0.0 -- to
> have it TRUE also for 2d-arrays aka matrix objects ??)
That would be awesome! More generally I wonder how feasible it would be
to fix all these inheritance quirks where inherits(x, "something"),
is(x, "something"), and is.something(x) disagree. They've been such a
nuisance for so many years...
Thanks,
H.
>
> The consequence of that is that
> currently, "often" foo.matrix is just a copy of foo.array in
> the case the latter exists:
> "base" examples: foo in {unique, duplicated, anyDuplicated}.
>
> So I propose you change current head.matrix and tail.matrix to
> head.array and tail.array
> (and then have head.matrix <- head.array etc, at least if the
> above quirk must remain, or remains (which I currently guess to
> be the case)).
>
>
> >> x = array(100, c(4, 5, 5))
>
> >> dim(x)
>
> > [1] 4 5 5
>
> >> head(x, 1)
>
> > [1] 100
>
> >> class(head(x))
>
> > [1] "numeric"
>
>
> > (For a 1d array, it does return another 1d array).
>
> > When extending head/tail to understand multiple dimensions as discussed in
> > this thread, then, should the behavior for 2+d arrays be explicitly
> > retained, or should head and tail do the analogous thing (with a head(<2d
> array> ) behaving the same as head(<matrix>), which honestly is what I
> > expected to already be happening)?
>
> > Are people using/relying on this behavior in their code, and if so, why/for
> > what?
>
> > Even more generally, one way forward is to have the default methods check
> > for dimensions, and use length if it is null:
>
> > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
> > {
> > if(any(n == 0))
> > stop("n must be non-zero or unspecified for all dimensions")
> > if(!is.null(dim(x)))
> > dimsx <- dim(x)
> > else
> > dimsx <- length(x)
>
> > ## this returns a list of vectors of indices in each
> > ## dimension, regardless of length of the the n
> > ## argument
> > sel <- lapply(seq_along(dimsx), function(i) {
> > dxi <- dimsx[i]
> > ## select all indices (full dim) if not specified
> > ni <- if(length(n) >= i) n[i] else dxi
> > ## handle negative ns
> > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
> > seq.int(to = dxi, length.out = ni)
> > })
> > args <- c(list(x), sel, drop = FALSE)
> > do.call("[", args)
> > }
>
>
> > I think this precludes the need for a separate data.frame method at all,
> > actually, though (I would think) tail.data.frame would still be defined and
> > exported for backwards compatibility. (the matrix method has some extra
> > bits so my current conception of it is still separate, though it might not
> > NEED to be).
>
> > The question then becomes, should head/tail always return something with
> > the same dimensionally (number of dims) it got, or should data.frame and
> > matrix be special cased in this regard, as they are now?
>
> > What are people's thoughts?
> > ~G
>
> > [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Xl_11U8w8hVRbuqAPQkz0uSW02kokK9EUPhOopxw0d8&s=vyKU4VkWLb_fGG6KeDPPjVM5_nLhav6UiX7NkzgqsuE&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list