[Rd] str(<1d-array>)

Fri Jan 23 15:41:38 CET 2009

on 01/23/2009 07:36 AM Martin Maechler wrote:
>>>>>> "TP" == Tony Plate <tplate at acm.org>
>>>>>>     on Thu, 22 Jan 2009 11:01:21 -0700 writes:
> 
>     TP> Martin Maechler wrote:
>     >>>>>>> "TP" == Tony Plate <tplate at acm.org>
>     >>>>>>> on Fri, 16 Jan 2009 13:10:04 -0700 writes:
>     >>>>>>> 
>     >> 
>     TP> Martin Maechler wrote:
>     >> >>>>>>> "PatB" == Patrick Burns <pburns at pburns.seanet.com>
>     >> >>>>>>> on Tue, 13 Jan 2009 17:00:40 +0000 writes:
>     >> >>>>>>> 
>     >> >> 
>     PatB> Henrik Bengtsson wrote:
>     >> >> >> Hi.
>     >> >> >> 
>     >> >> >> On Mon, Jan 12, 2009 at 11:58 PM, Prof Brian Ripley
>     >> >> >> <ripley at stats.ox.ac.uk> wrote:
>     >> >> >> 
>     >> >> >>> What you have is a one-dimensional array: they crop up
>     >> >> >>> in R most often from table() in my experience.
>     >> >> >>> 
>     >> >> >>> 
>     >> >> >>>> f <- table(rpois(100, 4)) str(f)
>     >> >> >>>> 
>     >> >> >>> 'table' int [, 1:10] 2 6 18 21 13 16 13 4 3 4 - attr(*,
>     >> >> >>> "dimnames")=List of 1 ..$ : chr [1:10] "0" "1" "2" "3"
>     >> >> >>> ...
>     >> >> >>> 
>     >> >> >>> and yes, f is an atmoic vector and yes, str()'s notation
>     >> >> >>> is confusing here but if it did [1:10] you would not
>     >> >> >>> know it was an array.  I recall discussing this with
>     >> >> >>> Martin Maechler (str's author) last century, and I've
>     >> >> >>> just checked that R 2.0.0 did the same.
>     >> >> >>> 
>     >> >> >>> The place in which one-dimensional arrays differ from
>     >> >> >>> normal vectors is how names are handled: notice that my
>     >> >> >>> example has dimnames not names, and ?names says
>     >> >> >>> 
>     >> >> >>> For a one-dimensional array the 'names' attribute really
>     >> >> >>> is 'dimnames[[1]]'.
>     >> >> >>> 
>     >> >> >> 
>     >> >> >> Thanks for this explanation.  One could then argue that
>     >> >> >> [1:10,] is somewhat better than [,1:10], but that is just polish.
>     >> >> 
>     >> >> yes.  And honestly I don't remember anymore why I chose the
>     >> >> "[,1:n]" notation.  It definitely was there already before R
>     >> >> came into existence, as  S  also has had one-dimensional arrays,
>     >> >> and I programmed the first version of str() in 1990.
>     >> >> 
>     PatB> Perhaps it could be:
>     >> >> 
>     PatB> [1:10(,)]
> 
>     PatB> That is weird enough that it should not lead people to
>     PatB> believe that it is a matrix.  But might prompt them a
>     PatB> bit in that direction.
>     >> >> 
>     >> >> Well, str() was always aimed a bit at experienced S (and R)
>     >> >> users, and I had always aimed somewhat to keep it's output
>     >> >> "compact".  I'm quite astonished that the OP didn't know about
>     >> >> 1D arrays in spite of the many years he's been using R.
>     >> >> Would a wierd solution like the above have helped?
>     >> >> 
>     >> >> At the moment, I'd tend to keep it "as is" if only just for
>     >> >> historical reminescence, but I can be convinced to change the
>     >> >> current "tendency" ... 
>     >> >> 
>     >> >> Martin Maechler, ETH Zurich  
>     >> >> 
>     TP> What about just including "(1d-array)", something like this
>     >> >> str(f)
>     TP> 'table' int [1:10](1d array) 5 5 9 23 26 16 9 4 2 1
>     TP> - attr(*, "dimnames")=List of 1
>     TP> ..$ : chr [1:10] "0" "1" "2" "3" ...
>     >> >> 
>     TP> only 9 extra characters for a rare case, and much, much less cryptic?
>     >> 
>     >> well,.. the next text request is to use
>     >> "character" instead of "chr", only 6 extra characters ....
>     >> 
>     >>  -> no way:  str() has its very concise "style" and should keep that.
>     >> 
>     TP> Brevity is good, but clarity is important too.   The output of str is 
>     TP> usually decipherable, but not so much in this case.  It's easy to 
>     TP> dismiss suggestions like replacing "chr" with "character" - the increase 
>     TP> in clarity would be minimal.  However, the potential increase in clarity 
>     TP> for a 1-d array is significant - the decrease in brevity is at question 
>     TP> here. Given the rarity of the case it seems like a decent tradeoff to 
>     TP> add "(1d-array)" (one could even just write "(1d)").   1-d arrays are 
>     TP> sufficiently rare that no concise and clear method of indicating them 
>     TP> using brackets or other symbols has arisen. You did say you "can be 
>     TP> convinced to change" it, but I won't attempt beyond this! :-)
> 
> well, "still can be .." .....
> 
> So you currently propose to replace
>      "int [,1:10] 5 5 9 23 26 16 9 4 2 1"
> by
>      "int [1:10](1d) 5 5 9 23 26 16 9 4 2 1"
> where Pat had
>      "int [1:10(,)] 5 5 9 23 26 16 9 4 2 1"
> 
> Since the [.....] is where we specify the dimensionality of all
> arrays in str(), I'd like to try something where things remain
> inside "[....]" as with Pat's version or e.g., with
> 
>      "int [1:10/1d] 5 5 9 23 26 16 9 4 2 1"
> 
> Opinions, further proposals ?

Recognizing that I am coming to this discussion quite late, how about:

       int [1:10(1d)] 5 5 9 23 26 16 9 4 2 1

?

I do think that any str() representation that includes a ',' would
continue to reinforce the current misunderstandings pertaining to a 1d
array.

Since using str() is a common response to posts on r-help regarding how
to access components of an object, there will be naive users who would
see something like (using Prof. Ripley's example):

> str(f)
 'table' int [, 1:11] 1 9 15 21 15 17 13 5 1 2 ...
 - attr(*, "dimnames")=List of 1
  ..$ : chr [1:11] "0" "1" "2" "3" ...

and then think that they could do:

> f[, 1]
Error in f[, 1] : incorrect number of dimensions

which of course they cannot.

I think that the above change would help to reinforce the notion that a
1d array can, for the most part, be treated as an atomic vector.
However, as Prof. Ripley has noted, there is a subtle difference in how
names/dimnames are treated. The use of '(1d)' in the str() output would
make it clear that this object is not quite a simple atomic vector, but
when indexing, can be treated as such.

Regards,

Marc Schwartz

<snip of content below this point>