[R] two questions for R beginners
John Sorkin
jsorkin at grecc.umaryland.edu
Mon Mar 1 15:19:10 CET 2010
If it looks like a duck and quacks like a duck, it ought to behave like a duck.
To the user a matrix and a dataframe look alike . . . except a dataframe can hold non-numeric values. Thus to the users, a matrix looks like a special case of a DF, or perhaps conversely. If you can address elements of one structure using a given syntax, you should be able to address elements of the other structure using the same syntax. To do otherwise leads to confusion and is counter intuitive.
John
John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Petr PIKAL <petr.pikal at precheza.cz> 3/1/2010 8:57 AM >>>
Hi
r-help-bounces at r-project.org napsal dne 01.03.2010 13:03:24:
< snip>
> >
> > I understand that 2 dimensional rectangular matrix looks quite
> > similar to data frame however it is only a vector with dimensions.
> > As such it can have items of only one type (numeric, character, ...).
> > And you can easily change dimensions of matrix.
> >
> > matrix<-1:12
> > dim(matrix) <- c(2,6)
> > matrix
> > dim(matrix) <- c(2,2,3)
> > matrix
> > dim(matrix) <-NULL
> > matrix
> >
> > So rectangular structure of printed matrix is a kind of coincidence
> > only, whereas rectangular structure of data frame is its main feature.
> >
> > Regards
> > Petr
> >>
> >> --
> >> Karl Ove Hufthammer
>
> Petr, I think that could be confusing! The way I see it is that
> a matrix is a special case of an array, whose "dimension" attribute
> is of length 2 (number of "rows", number of "columns"); and "row"
> and "column" refer to the rectangular display which you see when
> R prints to matrix. And this, of course, derives directly from
> the historic rectangular view of a matrix when written down.
>
> When you went from "dim(matrix)<-c(2,6)" to "dim(matrix)<-c(2,2,3)"
> you stripped it of its special title of "matrix" and cast it out
> into the motley mob of arrays (some of whom are matrices, but
> "matrix" no longer is).
>
> So the "rectangular structure of printed matrix" is not a coincidence,
> but is its main feature!
Ok. Point taken. However I feel that possibility to manipulate
matrix/array dimensions by simple changing them as I showed above
together with perceiving matrix as a **vector with dimensions** prevented
me especially in early days from using matrices instead of data frames and
vice versa.
Consider cbind and rbind confusing results for vectors with unequal mode.
Far to often we can see something like that
> cbind(1:2,letters[1:2])
[,1] [,2]
[1,] "1" "a"
[2,] "2" "b"
instead of
> data.frame(1:2,letters[1:2])
X1.2 letters.1.2.
1 1 a
2 2 b
and then a question why does not the result behave as expected. Each type
of object has some features which is good for some type of
manipulation/analysis/plotting bud quite detrimental for others.
Regards
Petr
>
> To come back to Karl's query about why "$" works for a dataframe
> but not for a matrix, note that "$" is the extractor for getting
> a named component of a list. So, Karl, when you did
>
> d=head(iris[1:4])
>
> you created a dataframe:
>
> str(d)
> # 'data.frame': 6 obs. of 4 variables:
> # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
> # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
> # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
> # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
>
> (with named components "Sepal.Length", ... , "Petal.Width"),
> and a dataframe is a special case of a general list. In a
> general list, the separate components can each be anything.
> In a dataframe, each component is a vector; the different
> vectors may be of different types (logical, numeric, ... )
> but of course the elements of any single vector must be
> of the same type; and, in a dataframe, all the vectors must
> have the same length (otherwise it is a general list, not
> a dataframe).
>
> So, when you print a dataframe, R chooses to display it
> as a rectangular structure. On the other hand, when you
> print a general list, R displays it quite differently:
>
> d
> # Sepal.Length Sepal.Width Petal.Length Petal.Width
> # 1 5.1 3.5 1.4 0.2
> # 2 4.9 3.0 1.4 0.2
> # 3 4.7 3.2 1.3 0.2
> # 4 4.6 3.1 1.5 0.2
> # 5 5.0 3.6 1.4 0.2
> # 6 5.4 3.9 1.7 0.4
>
> d3 <- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
> d3
> # $C1
> # [1] 1.1 1.2 1.3
> # $C2
> # [1] 2.1 2.2 2.3 2.4
>
> Notice the similarity (though not identity) between the print
> of d3 and the output of str(d). There is a bit more hard-wired
> stuff built into a dataframe which makes it more than simply
> a "list with all components vectors of equal length). However,
> one could also say that "the rectangular structure is its
> main feature".
>
> As to why "$" will not work on matrices: a matrix, as Petr
> points out, is a vector with a "dimensions" attribute which
> has length 2 (as opposed to a general array where the length
> of the dimensions attribute could be anything). Hence it is
> not a list of named components in the sense of "list".
>
> Hence "$" will not work with a matrix, since "$" will not
> be able to find any list-components. which is basically what
> the error message
>
> d2$Sepal.Width
> # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors
>
> is telling you: d2 is an atomic vector with a length-2 dimensions
> attribute. It has no list-type components for "$" to get its
> hands on.
>
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 01-Mar-10 Time: 12:03:21
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
More information about the R-help
mailing list