[R] Data.frame Vs Matrix Vs Array: Definitions Please

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 27 12:11:21 CEST 2010


On Wed, Oct 27, 2010 at 4:03 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
> Hi,
>
> Gabor gave you a great answer already. But I would add a few precisions.
> Someone please correct me if I'm wrong.
>
> Arrays are matrices with more than 2 dimensions. Put the other way: matrices
> are arrays with only 2 dimensions.

Arrays can have any number of dimensions including 1, 2, 3, etc.

	> # a 2d array is a matrix. Its composed from a vector plus two dimensions.
	> m <- array(1:4, c(2, 2))
	> dput(m)
	structure(1:4, .Dim = c(2L, 2L))
	> class(m)
	[1] "matrix"
	> is.array(m)
	[1] TRUE

	> # a 1d array is a vector plus a single dimension
	> a1 <- array(1:4, 4)
	> dput(a1)
	structure(1:4, .Dim = 4L)
	> dim(a1)
	[1] 4
	> class(a1)
	[1] "array"
	> is.array(a1)
	[1] TRUE

	> # if we remove dimension part its no longer an array but just a vector
	> nota <- a1
	> dim(nota) <- NULL
	> dput(nota)
	1:4
	> is.array(nota)
	[1] FALSE
	> is.vector(nota)
	[1] TRUE


>
> I would also add these:
> - the components of a vector have to be of the same mode (character,
> numeric, integer...)

however, a list with no attributes is a vector too so this is a vector:

   >  vl <- list(sin, 3, "a")
   >  is.vector(vl)
   [1] TRUE

A vector may not have attributes so arrays and factors are not vectors
although they are composed from vectors.

> - which implies that the components of matrices and arrays have to be also
> of the same mode (which might lead to some coercion of your data if you
> don't pay attention to it).
>
> Factor are character data, but coded as numeric mode. Each number is
> associated with a given string, the so-called levels. Here is an example:
> my.fac <- factor(c("something", "other", "more", "something", "other",
> "more"))

A factor is composed of an integer vector plus a levels attribute
(called .Label internally) as in this code:

   > fac <- factor(c("b", "a", "b"))
   > dput(fac)
   structure(c(2L, 1L, 2L), .Label = c("a", "b"), class = "factor")
   > levels(fac)
   [1] "a" "b"


> my.fac
>  [1] something other     more      something other     more
>  Levels: more other something
> mode(my.fac)
>  [1] "numeric"    ## coded as numeric even though you gave character
> strings!
> class(my.fac)
>  [1] "factor"
> levels(my.fac)
>  [1] "more"      "other"     "something"
> as.numeric(my.fac)
>  [1] 3 2 1 3 2 1                  ## internal representation
> as.character(my.fac)
> [1] "something" "other"     "more"      "something" "other"     "more"    ##
> what you think it is!
>
> I found that the book "Data Manipulation with R" from Phil Spector (2008)
>  was quite well done to explain all these object modes and classes, even
> though I wouldn't have understood completely by reading only this book (not
> that I have yet completely mastered this topic...)
>
> HTH,
> Ivan
>
>
>
> Le 10/27/2010 02:49, Gabor Grothendieck a écrit :
>>
>> On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio<matt.curcio.ri at gmail.com>
>>  wrote:
>>>
>>> Hi All,
>>> I am learning R and having a little trouble with the usage and proper
>>> definitions of data.frames vs. matrix vs vectors. I have read many R
>>> tutorials, and looked over ump-teen 'cheat' sheets and have found that
>>> no one has articulated a really good definition of the differences
>>> between 'data.frames', 'matrix', and 'arrays' and even 'factors'.  I
>>> realize that I might have missed someones R tutorial, and actually
>>> would like to receive 'your' most concise or most useful tutorial.
>>> Any help would be appreciated.
>>>
>>> My particular favorite explanation and helpful hint is from the
>>> 'R-Inferno'.  Don't get me wrong...  I think this pdf is great and
>>> some tables are excellent. Overall it is a very good primer but this
>>> one section leaves me puzzled.  This quote belies the lack of hard and
>>> fast rules for what and when to use 'data.frames', 'matrix', and
>>> 'arrays'.  It discusses ways in which to simplify your work.
>>>
>>> Here are a few possibilities for simplifying:
>>> • Don’t use a list when an atomic vector will do.
>>> • Don’t use a data frame when a matrix will do.
>>> • Don’t try to use an atomic vector when a list is needed.
>>> • Don’t try to use a matrix when a data frame is needed.
>>>
>>> Cheers,
>>> Matt C
>>
>> Look at their internal representations and it will become clearer.  v,
>> a vector, has length 6.  m, a matrix, is actually the same as the
>> vector v except is has dimensions too. Since m is just a vector with
>> dimensions, m has length 6 as well.  L is a list and has length 2
>> because its a vector each of whose components is itself a vector.  DF
>> is a data frame and is the same as L except its 2 components must each
>> have the same length and it must have row and column names.  If you
>> don't assign the row and column names they are automatically generated
>> as we can see.  Note that row.names = c(NA, -3L) is a short form for
>> row names of 1:3 and .Names internally refers to column names.
>>
>>> v<- 1:6 # vector
>>> dput(v)
>>
>> 1:6
>>>
>>> m<- v; dim(m)<- 2:3 # m is a matrix since we added dimensions
>>> dput(m)
>>
>> structure(1:6, .Dim = 2:3)
>>>
>>> L<- list(1:3, 4:6)
>>> dput(L)
>>
>> list(1:3, 4:6)
>>>
>>> DF<- data.frame(1:3, 4:6)
>>> dput(DF)
>>
>> structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3", "X4.6"
>> ), row.names = c(NA, -3L), class = "data.frame")
>>
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calandra at uni-hamburg.de
>
> **********
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list