[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Fox, John j|ox @end|ng |rom mcm@@ter@c@
Tue Sep 17 14:32:13 CEST 2019


Dear Herve,

Sorry, I should have said "matrices" rather than "data frames" -- brief() has methods for both.

Best,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Sep 17, 2019, at 8:29 AM, Fox, John <jfox using mcmaster.ca> wrote:
> 
> Dear Herve,
> 
> The brief() generic function in the car package does something very similar to that for data frames (and has methods for other classes of objects as well).
> 
> Best,
> John
> 
>  -----------------------------
>  John Fox, Professor Emeritus
>  McMaster University
>  Hamilton, Ontario, Canada
>  Web: http::/socserv.mcmaster.ca/jfox
> 
>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <hpages using fredhutch.org> wrote:
>> 
>> Hi,
>> 
>> Alternatively, how about a new glance() generic that would do something 
>> like this:
>> 
>>> library(DelayedArray)
>>> glance <- DelayedArray:::show_compact_array
>> 
>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
>>> glance(M)
>> <1000 x 2000> matrix object of type "double":
>>               [,1]        [,2]        [,3] ...    [,1999]    [,2000]
>>   [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
>>   [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
>>   [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
>>   [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
>>   [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
>>    ...           .           .           .   .          .          .
>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639 -0.8130713
>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923 -1.6287694
>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962  0.2552267
>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121 -1.1695501
>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430 -0.1703396
>> 
>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
>>> glance(A)
>> <50 x 20 x 10 x 100> array object of type "double":
>> ,,1,1
>>            [,1]       [,2]       [,3] ...      [,19]      [,20]
>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
>>  ...          .          .          .   .          .          .
>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394
>> 
>> ...
>> 
>> ,,10,100
>>            [,1]       [,2]       [,3] ...      [,19]      [,20]
>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
>>  ...          .          .          .   .          .          .
>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623
>> 
>> H.
>> 
>> 
>> On 9/16/19 00:54, Michael Chirico wrote:
>>> Awesome. Gabe, since you already have a workshopped version, would you like
>>> to proceed? Feel free to ping me to review the patch once it's posted.
>>> 
>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <maechler using stat.math.ethz.ch>
>>> wrote:
>>> 
>>>>>>>>> Michael Chirico
>>>>>>>>>    on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>>>> 
>>>>> Finally read in detail your response Gabe. Looks great,
>>>>> and I agree it's quite intuitive, as well as agree against
>>>>> non-recycling.
>>>> 
>>>>> Once the length(n) == length(dim(x)) behavior is enabled,
>>>>> I don't think there's any need/desire to have head() do
>>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>>>>> those familiar with head(x, 6), it would seem to me.
>>>> 
>>>>> Mike C
>>>> 
>>>> Thank you, Gabe, and Michael.
>>>> I did like Gabe's proposal already back in July but was
>>>> busy and/or vacationing then ...
>>>> 
>>>> If you submit this with a patch (that includes changes to both
>>>> *.R and *.Rd , including some example) as "wishlist" item to R's
>>>> bugzilla, I'm willing/happy to check and commit this to R-devel.
>>>> 
>>>> Martin
>>>> 
>>>> 
>>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>>>>> <gabembecker using gmail.com> wrote:
>>>> 
>>>>>> Hi Michael and Abby,
>>>>>> 
>>>>>> So one thing that could happen that would be backwards
>>>>>> compatible (with the exception of something that was an
>>>>>> error no longer being an error) is head and tail could
>>>>>> take vectors of length (dim(x)) rather than integers of
>>>>>> length for n, with the default being n=6 being equivalent
>>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>>>>>> the deprecation cycle, if not permanently. It not
>>>>>> recycling would be unexpected based on the behavior of
>>>>>> many R functions but would preserve the current behavior
>>>>>> while granting more fine-grained control to users that
>>>>>> feel they need it.
>>>>>> 
>>>>>> A rapidly thrown-together prototype of such a method for
>>>>>> the head of a matrix case is as follows:
>>>>>> 
>>>>>> head2 = function(x, n = 6L, ...) { indvecs =
>>>>>> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>>>>>> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>>>>>> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>>>>>> do.call("[", lstargs) }
>>>>>> 
>>>>>> 
>>>>>>> mat = matrix(1:100, 10, 10)
>>>>>> 
>>>>>>> *head(mat)*
>>>>>> 
>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>>>> 
>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>>>>>> 
>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>>>>>> 
>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>>>>>> 
>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>>>>>> 
>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>>>>>> 
>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>>>>>> 
>>>>>>> *head2(mat)*
>>>>>> 
>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>>>> 
>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>>>>>> 
>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>>>>>> 
>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>>>>>> 
>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>>>>>> 
>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>>>>>> 
>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>>>>>> 
>>>>>>> *head2(mat, c(2, 3))*
>>>>>> 
>>>>>> [,1] [,2] [,3]
>>>>>> 
>>>>>> [1,] 1 11 21
>>>>>> 
>>>>>> [2,] 2 12 22
>>>>>> 
>>>>>>> *head2(mat, c(2, -9))*
>>>>>> 
>>>>>> [,1]
>>>>>> 
>>>>>> [1,] 1
>>>>>> 
>>>>>> [2,] 2
>>>>>> 
>>>>>> 
>>>>>> Now one thing to keep in mind here, is that I think we'd
>>>>>> either a) have to make the non-recycling behavior
>>>>>> permanent, or b) have head treat data.frames and matrices
>>>>>> different with respect to the subsets they grab (which
>>>>>> strikes me as a *Bad Plan *(tm)).
>>>>>> 
>>>>>> So I don't think the default behavior would ever be
>>>>>> mat[1:6, 1:6], not because of backwards compatibility,
>>>>>> but because at least in my intuition that is just not
>>>>>> what head on a data.frame should do by default, and I
>>>>>> think the behaviors for the basic rectangular datatypes
>>>>>> should "stick together". I mean, also because of
>>>>>> backwards compatibility, but that could *in theory*
>>>>>> change across a long enough deprecation cycle, but the
>>>>>> conceptually right thing to do with a data.frame probably
>>>>>> won't.
>>>>>> 
>>>>>> All of that said, is head(mat, c(6, 6)) really that much
>>>>>> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>>>>>> (I know this will behave differently if any of the dims
>>>>>> of mat are less than 6, but if so why are you heading it
>>>>>> in the first place ;) )? I don't really have a strong
>>>>>> feeling on the answer to that.
>>>>>> 
>>>>>> I'm happy to put a patch for head.matrix,
>>>>>> head.data.frame, tail.matrix and tail.data.frame, plus
>>>>>> documentation, if people on R-core are interested in
>>>>>> this.
>>>>>> 
>>>>>> Note, as most here probably know, and as alluded to
>>>>>> above, length(n) > 1 for head or tail currently give an
>>>>>> error, so this would be an extension of the existing
>>>>>> functionality in the mathematical extension sense, where
>>>>>> all existing behavior would remain identical, but the
>>>>>> support/valid parameter space would grow.
>>>>>> 
>>>>>> Best, ~G
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>>>>>> <spurdle.a using gmail.com> wrote:
>>>>>> 
>>>>>>>> I assume there are lots of backwards-compatibility
>>>>>>> issues as well as valid > use cases for this behavior,
>>>>>>> so I guess defaulting to M[1:6, 1:6] is out of > the
>>>>>>> question.
>>>>>>> 
>>>>>>> Agree.
>>>>>>> 
>>>>>>>> Is there any scope for adding a new argument to
>>>>>>> head.matrix that would > allow this flexibility?
>>>>>>> 
>>>>>>> I agree with what you're trying to achieve.  However,
>>>>>>> I'm not sure this is as simple as you're suggesting.
>>>>>>> 
>>>>>>> What if the user wants "head" in rows but "tail" in
>>>>>>> columns.  Or "head" in rows, and both "head" and "tail"
>>>>>>> in columns.  With head and tail alone, there's a
>>>>>>> combinatorial explosion.
>>>>>>> 
>>>>>>> Also, when using tail on an unnamed matrix, it may be
>>>>>>> desirable to name rows and columns.
>>>>>>> 
>>>>>>> And all of this assumes standard matrix objects.  Add in
>>>>>>> a matrix subclasses and related objects, and things get
>>>>>>> more complex still.
>>>>>>> 
>>>>>>> As I suggested in a another thread, a few days ago, I'm
>>>>>>> planning to write an R package for matrices and
>>>>>>> matrix-like objects (possibly extending the Matrix
>>>>>>> package), with an initial emphasis on subsetting,
>>>>>>> printing and formatting.  So, I'm interested to hear
>>>>>>> more suggestions on this topic.
>>>>>>> 
>>>>>>> [[alternative HTML version deleted]]
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-devel using r-project.org mailing list
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>>>>> 
>>>>>> 
>>>> 
>>>>> [[alternative HTML version deleted]]
>>>> 
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>> 
>>> 
>>> 	[[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>> 
>> 
>> -- 
>> Hervé Pagès
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpages using fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list