[R] Data.frame Vs Matrix Vs Array: Definitions Please

Ivan Calandra ivan.calandra at uni-hamburg.de
Wed Oct 27 10:03:13 CEST 2010


Hi,

Gabor gave you a great answer already. But I would add a few precisions. 
Someone please correct me if I'm wrong.

Arrays are matrices with more than 2 dimensions. Put the other way: 
matrices are arrays with only 2 dimensions.

I would also add these:
- the components of a vector have to be of the same mode (character, 
numeric, integer...)
- which implies that the components of matrices and arrays have to be 
also of the same mode (which might lead to some coercion of your data if 
you don't pay attention to it).

Factor are character data, but coded as numeric mode. Each number is 
associated with a given string, the so-called levels. Here is an example:
my.fac <- factor(c("something", "other", "more", "something", "other", 
"more"))
my.fac
   [1] something other     more      something other     more
   Levels: more other something
mode(my.fac)
   [1] "numeric"    ## coded as numeric even though you gave character 
strings!
class(my.fac)
   [1] "factor"
levels(my.fac)
   [1] "more"      "other"     "something"
as.numeric(my.fac)
   [1] 3 2 1 3 2 1                  ## internal representation
as.character(my.fac)
[1] "something" "other"     "more"      "something" "other"     
"more"    ## what you think it is!

I found that the book "Data Manipulation with R" from Phil Spector 
(2008)  was quite well done to explain all these object modes and 
classes, even though I wouldn't have understood completely by reading 
only this book (not that I have yet completely mastered this topic...)

HTH,
Ivan



Le 10/27/2010 02:49, Gabor Grothendieck a écrit :
> On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio<matt.curcio.ri at gmail.com>  wrote:
>> Hi All,
>> I am learning R and having a little trouble with the usage and proper
>> definitions of data.frames vs. matrix vs vectors. I have read many R
>> tutorials, and looked over ump-teen 'cheat' sheets and have found that
>> no one has articulated a really good definition of the differences
>> between 'data.frames', 'matrix', and 'arrays' and even 'factors'.  I
>> realize that I might have missed someones R tutorial, and actually
>> would like to receive 'your' most concise or most useful tutorial.
>> Any help would be appreciated.
>>
>> My particular favorite explanation and helpful hint is from the
>> 'R-Inferno'.  Don't get me wrong...  I think this pdf is great and
>> some tables are excellent. Overall it is a very good primer but this
>> one section leaves me puzzled.  This quote belies the lack of hard and
>> fast rules for what and when to use 'data.frames', 'matrix', and
>> 'arrays'.  It discusses ways in which to simplify your work.
>>
>> Here are a few possibilities for simplifying:
>> • Don’t use a list when an atomic vector will do.
>> • Don’t use a data frame when a matrix will do.
>> • Don’t try to use an atomic vector when a list is needed.
>> • Don’t try to use a matrix when a data frame is needed.
>>
>> Cheers,
>> Matt C
> Look at their internal representations and it will become clearer.  v,
> a vector, has length 6.  m, a matrix, is actually the same as the
> vector v except is has dimensions too. Since m is just a vector with
> dimensions, m has length 6 as well.  L is a list and has length 2
> because its a vector each of whose components is itself a vector.  DF
> is a data frame and is the same as L except its 2 components must each
> have the same length and it must have row and column names.  If you
> don't assign the row and column names they are automatically generated
> as we can see.  Note that row.names = c(NA, -3L) is a short form for
> row names of 1:3 and .Names internally refers to column names.
>
>> v<- 1:6 # vector
>> dput(v)
> 1:6
>> m<- v; dim(m)<- 2:3 # m is a matrix since we added dimensions
>> dput(m)
> structure(1:6, .Dim = 2:3)
>> L<- list(1:3, 4:6)
>> dput(L)
> list(1:3, 4:6)
>> DF<- data.frame(1:3, 4:6)
>> dput(DF)
> structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3", "X4.6"
> ), row.names = c(NA, -3L), class = "data.frame")
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php



More information about the R-help mailing list