[R] two questions for R beginners

Patrick Burns pburns at pburns.seanet.com
Mon Mar 1 17:08:52 CET 2010


If it looks like a duck and quacks like a duck,
you ought to treat it like a duck.  That is,
use two subscripts:

x[i, j]

If you are an ornithologist, then you will know
more precisely what can be done.

Pat


On 01/03/2010 14:19, John Sorkin wrote:
> If it looks like a duck and quacks like a duck, it ought to behave like a duck.
>
> To the user a matrix and a dataframe look alike . . . except a dataframe can hold non-numeric values. Thus to the users, a matrix looks like a special case of a DF, or perhaps conversely. If you can address elements of one structure using a given syntax, you should be able to address elements of the other structure using the same syntax. To do otherwise leads to confusion and is counter intuitive.
> John
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>>  Petr PIKAL<petr.pikal at precheza.cz>  3/1/2010 8:57 AM>>>
> Hi
>
> r-help-bounces at r-project.org napsal dne 01.03.2010 13:03:24:
>
> <  snip>
>
>>>
>>> I understand that 2 dimensional rectangular matrix looks quite
>>> similar to data frame however it is only a vector with dimensions.
>>> As such it can have items of only one type (numeric, character, ...).
>>> And you can easily change dimensions of matrix.
>>>
>>> matrix<-1:12
>>> dim(matrix)<- c(2,6)
>>> matrix
>>> dim(matrix)<- c(2,2,3)
>>> matrix
>>> dim(matrix)<-NULL
>>> matrix
>>>
>>> So rectangular structure of printed matrix is a kind of coincidence
>>> only, whereas rectangular structure of data frame is its main feature.
>>>
>>> Regards
>>> Petr
>>>>
>>>> --
>>>> Karl Ove Hufthammer
>>
>> Petr, I think that could be confusing! The way I see it is that
>> a matrix is a special case of an array, whose "dimension" attribute
>> is of length 2 (number of "rows", number of "columns"); and "row"
>> and "column" refer to the rectangular display which you see when
>> R prints to matrix. And this, of course, derives directly from
>> the historic rectangular view of a matrix when written down.
>>
>> When you went from "dim(matrix)<-c(2,6)" to "dim(matrix)<-c(2,2,3)"
>> you stripped it of its special title of "matrix" and cast it out
>> into the motley mob of arrays (some of whom are matrices, but
>> "matrix" no longer is).
>>
>> So the "rectangular structure of printed matrix" is not a coincidence,
>> but is its main feature!
>
> Ok. Point taken. However I feel that possibility to manipulate
> matrix/array dimensions by simple changing them as I  showed above
> together with perceiving matrix as a **vector with dimensions** prevented
> me especially in early days from using matrices instead of data frames and
> vice versa.
>
> Consider cbind and rbind confusing results for vectors with unequal mode.
> Far to often we can see something like that
>
>> cbind(1:2,letters[1:2])
>       [,1] [,2]
> [1,] "1"  "a"
> [2,] "2"  "b"
>
> instead of
>
>> data.frame(1:2,letters[1:2])
>    X1.2 letters.1.2.
> 1    1            a
> 2    2            b
>
> and then a question why does not the result behave as expected. Each type
> of object has some features which is good for some type of
> manipulation/analysis/plotting bud quite detrimental for others.
>
> Regards
> Petr
>
>
>>
>> To come back to Karl's query about why "$" works for a dataframe
>> but not for a matrix, note that "$" is the extractor for getting
>> a named component of a list. So, Karl, when you did
>>
>>    d=head(iris[1:4])
>>
>> you created a dataframe:
>>
>>    str(d)
>>    # 'data.frame':   6 obs. of  4 variables:
>>    #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
>>    #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
>>    #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
>>    #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
>>
>> (with named components "Sepal.Length", ... , "Petal.Width"),
>> and a dataframe is a special case of a general list. In a
>> general list, the separate components can each be anything.
>> In a dataframe, each component is a vector; the different
>> vectors may be of different types (logical, numeric, ... )
>> but of course the elements of any single vector must be
>> of the same type; and, in a dataframe, all the vectors must
>> have the same length (otherwise it is a general list, not
>> a dataframe).
>>
>> So, when you print a dataframe, R chooses to display it
>> as a rectangular structure. On the other hand, when you
>> print a general list, R displays it quite differently:
>>
>>    d
>>    #   Sepal.Length Sepal.Width Petal.Length Petal.Width
>>    # 1          5.1         3.5          1.4         0.2
>>    # 2          4.9         3.0          1.4         0.2
>>    # 3          4.7         3.2          1.3         0.2
>>    # 4          4.6         3.1          1.5         0.2
>>    # 5          5.0         3.6          1.4         0.2
>>    # 6          5.4         3.9          1.7         0.4
>>
>>    d3<- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
>>    d3
>>    # $C1
>>    # [1] 1.1 1.2 1.3
>>    # $C2
>>    # [1] 2.1 2.2 2.3 2.4
>>
>> Notice the similarity (though not identity) between the print
>> of d3 and the output of str(d). There is a bit more hard-wired
>> stuff built into a dataframe which makes it more than simply
>> a "list with all components vectors of equal length). However,
>> one could also say that "the rectangular structure is its
>> main feature".
>>
>> As to why "$" will not work on matrices: a matrix, as Petr
>> points out, is a vector with a "dimensions" attribute which
>> has length 2 (as opposed to a general array where the length
>> of the dimensions attribute could be anything). Hence it is
>> not a list of named components in the sense of "list".
>>
>> Hence "$" will not work with a matrix, since "$" will not
>> be able to find any list-components. which is basically what
>> the error message
>>
>>    d2$Sepal.Width
>>    # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors
>>
>> is telling you: d2 is an atomic vector with a length-2 dimensions
>> attribute. It has no list-type components for "$" to get its
>> hands on.
>>
>> Ted.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding)<Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 01-Mar-10                                       Time: 12:03:21
>> ------------------------------ XFMail ------------------------------
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for th...{{dropped:6}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Patrick Burns
pburns at pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')



More information about the R-help mailing list