[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Fri Oct 18 20:59:52 CEST 2019


Hi Martin et al.

Sorry for not getting back onto this sooner. I've been pretty well buried
under travel plus being sick for a bit, but I will be happy to roll up a
patch for this, including documentation and put it into a wishlist item.

I'll aim to do that at some point next week.

Thanks @Martin Maechler <maechler using stat.math.ethz.ch> for engaging with us
and being willing to consider the patch.

Best,
~G

On Tue, Sep 17, 2019 at 9:17 AM Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Fox, John
> >>>>>     on Tue, 17 Sep 2019 12:32:13 +0000 writes:
>
>     > Dear Herve,
>     > Sorry, I should have said "matrices" rather than "data frames" --
> brief() has methods for both.
>
>     > Best,
>     > John
>
>     > -----------------------------
>     > John Fox, Professor Emeritus
>     > McMaster University
>     > Hamilton, Ontario, Canada
>     > Web: http::/socserv.mcmaster.ca/jfox
>
>     >> On Sep 17, 2019, at 8:29 AM, Fox, John <jfox using mcmaster.ca> wrote:
>     >>
>     >> Dear Herve,
>     >>
>     >> The brief() generic function in the car package does something very
> similar to that for data frames (and has methods for other classes of
> objects as well).
>     >>
>     >> Best,
>     >> John
>     >>
>     >> -----------------------------
>     >> John Fox, Professor Emeritus
>     >> McMaster University
>     >> Hamilton, Ontario, Canada
>     >> Web: http::/socserv.mcmaster.ca/jfox
>     >>
>     >>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <hpages using fredhutch.org>
> wrote:
>     >>>
>     >>> Hi,
>     >>>
>     >>> Alternatively, how about a new glance() generic that would do
> something
>     >>> like this:
>     >>>
>     >>>> library(DelayedArray)
>     >>>> glance <- DelayedArray:::show_compact_array
>     >>>
>     >>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
>     >>>> glance(M)
>     >>> <1000 x 2000> matrix object of type "double":
>     >>> [,1]        [,2]        [,3] ...    [,1999]    [,2000]
>     >>> [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
>     >>> [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
>     >>> [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
>     >>> [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
>     >>> [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
>     >>> ...           .           .           .   .          .          .
>     >>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639
> -0.8130713
>     >>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923
> -1.6287694
>     >>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962
> 0.2552267
>     >>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121
> -1.1695501
>     >>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430
> -0.1703396
>     >>>
>     >>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
>     >>>> glance(A)
>     >>> <50 x 20 x 10 x 100> array object of type "double":
>     >>> ,,1,1
>     >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
>     >>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
>     >>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
>     >>> ...          .          .          .   .          .          .
>     >>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
>     >>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394
>     >>>
>     >>> ...
>     >>>
>     >>> ,,10,100
>     >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
>     >>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
>     >>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
>     >>> ...          .          .          .   .          .          .
>     >>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
>     >>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623
>     >>>
>     >>> H.
>
> Thank you, Hervé and John.
> Both glance() and brief() are nice, and I think a version of one of
> them could also make a nice addition to the 'utils' package.
>
> However, there's a principal difference between them and the
> proposed generalized head {or tail} :
> The latter really does *return* a sub matrix/array of chosen
> dimensions with modified dimnames and that *object* then is
> printed if not assigned.
>
> OTOH,  glance() and brief() rather are versions of print()
> and I think have a dedicated "display-only" purpose {yes, I see they do
> return something; glance() returning a character object, brief()
> returning the principal argument invisibly, the same as any
> "correct" print() method..}
>
> From the above, I think it may make sense to entertain both a
> generalization of head() and one such a glance() / brief()
> /.. function which for a matrix shows all 4 corners of the
> matrix of data frame.
>
> There's another important criterion here:  __Simplicity__ in the
> code that's added (and will have to be maintained as part of R
> "forever" into the future)...
> AFAICS, the DelayedArray stuff is beatifully modular, but
> possibly also much entangled in the dependent packages and classes we
> cannot require for 'utils'.
>
> The current source for head() and tail() and all their methods
> in utils is just 83 lines of code  {file utils/R/head.R minus
> the initial mostly copyright comments}.
> I am very reluctant to consider blowing that up by factors...
>
>
> Martin
>
>     >>> On 9/16/19 00:54, Michael Chirico wrote:
>     >>>> Awesome. Gabe, since you already have a workshopped version,
> would you like
>     >>>> to proceed? Feel free to ping me to review the patch once it's
> posted.
>     >>>>
>     >>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <
> maechler using stat.math.ethz.ch>
>     >>>> wrote:
>     >>>>
>     >>>>>>>>>> Michael Chirico
>     >>>>>>>>>> on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>     >>>>>
> >>>>> Finally read in detail your response Gabe. Looks great,
> >>>>> and I agree it's quite intuitive, as well as agree against
> >>>>> non-recycling.
>     >>>>>
> >>>>> Once the length(n) == length(dim(x)) behavior is enabled,
> >>>>> I don't think there's any need/desire to have head() do
> >>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
> >>>>> those familiar with head(x, 6), it would seem to me.
>     >>>>>
> >>>>> Mike C
>     >>>>>
>     >>>>> Thank you, Gabe, and Michael.
>     >>>>> I did like Gabe's proposal already back in July but was
>     >>>>> busy and/or vacationing then ...
>     >>>>>
>     >>>>> If you submit this with a patch (that includes changes to both
>     >>>>> *.R and *.Rd , including some example) as "wishlist" item to R's
>     >>>>> bugzilla, I'm willing/happy to check and commit this to R-devel.
>     >>>>>
>     >>>>> Martin
>     >>>>>
>     >>>>>
> >>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
> >>>>> <gabembecker using gmail.com> wrote:
>     >>>>>
>     >>>>>>> Hi Michael and Abby,
>     >>>>>>>
>     >>>>>>> So one thing that could happen that would be backwards
>     >>>>>>> compatible (with the exception of something that was an
>     >>>>>>> error no longer being an error) is head and tail could
>     >>>>>>> take vectors of length (dim(x)) rather than integers of
>     >>>>>>> length for n, with the default being n=6 being equivalent
>     >>>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>     >>>>>>> the deprecation cycle, if not permanently. It not
>     >>>>>>> recycling would be unexpected based on the behavior of
>     >>>>>>> many R functions but would preserve the current behavior
>     >>>>>>> while granting more fine-grained control to users that
>     >>>>>>> feel they need it.
>     >>>>>>>
>     >>>>>>> A rapidly thrown-together prototype of such a method for
>     >>>>>>> the head of a matrix case is as follows:
>     >>>>>>>
>     >>>>>>> head2 = function(x, n = 6L, ...) { indvecs =
>     >>>>>>> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>     >>>>>>> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>     >>>>>>> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>     >>>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>     >>>>>>> do.call("[", lstargs) }
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>> mat = matrix(1:100, 10, 10)
>     >>>>>>>
>     >>>>>>>> *head(mat)*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>>>>>>
>     >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>>>>>>
>     >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>>>>>>
>     >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>>>>>>
>     >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>>>>>>
>     >>>>>>>> *head2(mat)*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>>>>>>
>     >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>>>>>>
>     >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>>>>>>
>     >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>>>>>>
>     >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>>>>>>
>     >>>>>>>> *head2(mat, c(2, 3))*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22
>     >>>>>>>
>     >>>>>>>> *head2(mat, c(2, -9))*
>     >>>>>>>
>     >>>>>>> [,1]
>     >>>>>>>
>     >>>>>>> [1,] 1
>     >>>>>>>
>     >>>>>>> [2,] 2
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Now one thing to keep in mind here, is that I think we'd
>     >>>>>>> either a) have to make the non-recycling behavior
>     >>>>>>> permanent, or b) have head treat data.frames and matrices
>     >>>>>>> different with respect to the subsets they grab (which
>     >>>>>>> strikes me as a *Bad Plan *(tm)).
>     >>>>>>>
>     >>>>>>> So I don't think the default behavior would ever be
>     >>>>>>> mat[1:6, 1:6], not because of backwards compatibility,
>     >>>>>>> but because at least in my intuition that is just not
>     >>>>>>> what head on a data.frame should do by default, and I
>     >>>>>>> think the behaviors for the basic rectangular datatypes
>     >>>>>>> should "stick together". I mean, also because of
>     >>>>>>> backwards compatibility, but that could *in theory*
>     >>>>>>> change across a long enough deprecation cycle, but the
>     >>>>>>> conceptually right thing to do with a data.frame probably
>     >>>>>>> won't.
>     >>>>>>>
>     >>>>>>> All of that said, is head(mat, c(6, 6)) really that much
>     >>>>>>> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>     >>>>>>> (I know this will behave differently if any of the dims
>     >>>>>>> of mat are less than 6, but if so why are you heading it
>     >>>>>>> in the first place ;) )? I don't really have a strong
>     >>>>>>> feeling on the answer to that.
>     >>>>>>>
>     >>>>>>> I'm happy to put a patch for head.matrix,
>     >>>>>>> head.data.frame, tail.matrix and tail.data.frame, plus
>     >>>>>>> documentation, if people on R-core are interested in
>     >>>>>>> this.
>     >>>>>>>
>     >>>>>>> Note, as most here probably know, and as alluded to
>     >>>>>>> above, length(n) > 1 for head or tail currently give an
>     >>>>>>> error, so this would be an extension of the existing
>     >>>>>>> functionality in the mathematical extension sense, where
>     >>>>>>> all existing behavior would remain identical, but the
>     >>>>>>> support/valid parameter space would grow.
>     >>>>>>>
>     >>>>>>> Best, ~G
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>     >>>>>>> <spurdle.a using gmail.com> wrote:
>     >>>>>>>
>     >>>>>>>>> I assume there are lots of backwards-compatibility
>     >>>>>>>> issues as well as valid > use cases for this behavior,
>     >>>>>>>> so I guess defaulting to M[1:6, 1:6] is out of > the
>     >>>>>>>> question.
>     >>>>>>>>
>     >>>>>>>> Agree.
>     >>>>>>>>
>     >>>>>>>>> Is there any scope for adding a new argument to
>     >>>>>>>> head.matrix that would > allow this flexibility?
>     >>>>>>>>
>     >>>>>>>> I agree with what you're trying to achieve.  However,
>     >>>>>>>> I'm not sure this is as simple as you're suggesting.
>     >>>>>>>>
>     >>>>>>>> What if the user wants "head" in rows but "tail" in
>     >>>>>>>> columns.  Or "head" in rows, and both "head" and "tail"
>     >>>>>>>> in columns.  With head and tail alone, there's a
>     >>>>>>>> combinatorial explosion.
>     >>>>>>>>
>     >>>>>>>> Also, when using tail on an unnamed matrix, it may be
>     >>>>>>>> desirable to name rows and columns.
>     >>>>>>>>
>     >>>>>>>> And all of this assumes standard matrix objects.  Add in
>     >>>>>>>> a matrix subclasses and related objects, and things get
>     >>>>>>>> more complex still.
>     >>>>>>>>
>     >>>>>>>> As I suggested in a another thread, a few days ago, I'm
>     >>>>>>>> planning to write an R package for matrices and
>     >>>>>>>> matrix-like objects (possibly extending the Matrix
>     >>>>>>>> package), with an initial emphasis on subsetting,
>     >>>>>>>> printing and formatting.  So, I'm interested to hear
>     >>>>>>>> more suggestions on this topic.
>     >>>>>>>>
>     >>>>>>>> [[alternative HTML version deleted]]
>     >>>>>>>>
>     >>>>>>>> ______________________________________________
>     >>>>>>>> R-devel using r-project.org mailing list
>     >>>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>     >>>>>>>>
>     >>>>>>>
>     >>>>>
> >>>>> [[alternative HTML version deleted]]
>     >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel using r-project.org mailing list
> >>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>     >>>>>
>     >>>>
>     >>>> [[alternative HTML version deleted]]
>     >>>>
>     >>>> ______________________________________________
>     >>>> R-devel using r-project.org mailing list
>     >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>     >>>>
>     >>>
>     >>> --
>     >>> Hervé Pagès
>     >>>
>     >>> Program in Computational Biology
>     >>> Division of Public Health Sciences
>     >>> Fred Hutchinson Cancer Research Center
>     >>> 1100 Fairview Ave. N, M1-B514
>     >>> P.O. Box 19024
>     >>> Seattle, WA 98109-1024
>     >>>
>     >>> E-mail: hpages using fredhutch.org
>     >>> Phone:  (206) 667-5791
>     >>> Fax:    (206) 667-1319
>     >>> ______________________________________________
>     >>> R-devel using r-project.org mailing list
>     >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>     >> ______________________________________________
>     >> R-devel using r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list