[R] Tricky (?) conversion from data.frame to matrix where not all pairs exist

David Winsemius dwinsemius at comcast.net
Wed Jun 22 16:33:12 CEST 2011


On Jun 22, 2011, at 9:46 AM, Marius Hofert wrote:

> Hi David,
>
> thanks for the quick response. That's nice. Is there also a way  
> without loading an additional package? I'd prefer loading less  
> packages if possible.

 > xtb <- xtabs(value ~ year + block, data = df)
 > xtb
       block
year   a b c
   2000 1 0 5
   2001 2 4 6
   2002 3 0 0
 > as.data.frame(xtb)
   year block Freq
1 2000     a    1
2 2001     a    2
3 2002     a    3
4 2000     b    0
5 2001     b    4
6 2002     b    0
7 2000     c    5
8 2001     c    6
9 2002     c    0
 > xt.df <- as.data.frame(xtb)
 > xt.df[xt.df[,3] != 0 , ]
   year block Freq
1 2000     a    1
2 2001     a    2
3 2002     a    3
5 2001     b    4
7 2000     c    5
8 2001     c    6

As Hadley pointed out yesterday this does coerce the margin names to  
factors,

 > str(xt.df)
'data.frame':	9 obs. of  3 variables:
  $ year : Factor w/ 3 levels "2000","2001",..: 1 2 3 1 2 3 1 2 3
  $ block: Factor w/ 3 levels "a","b","c": 1 1 1 2 2 2 3 3 3
  $ Freq : num  1 2 3 0 4 0 5 6 0

-- 
David.


>
> Cheers,
>
> Marius
>
>
> On 2011-06-22, at 15:38 , David Winsemius wrote:
>
>>
>> On Jun 22, 2011, at 9:19 AM, Marius Hofert wrote:
>>
>>> Hi,
>>>
>>> and what's the simplest way to obtain a *data.frame* with all years?
>>> The matching seems more difficult here because the years can/will  
>>> show up several times...
>>>
>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>>               block=c("a","a","a","b","c","c"), value=1:6))
>>> (df. <- data.frame(year=rep(2000:2002, 3), block=rep(c("a", "b",  
>>> "c"), each=3), value=0))
>>> # how to fill in the given values?
>>
>> These days I think most people would reach for melt() in either  
>> reshape or reshape2 packages:
>>
>>> require(reshape2)
>> Loading required package: reshape2
>>> melt(xtb)
>> year block value
>> 1 2000     a     1
>> 2 2001     a     2
>> 3 2002     a     3
>> 4 2000     b     0
>> 5 2001     b     4
>> 6 2002     b     0
>> 7 2000     c     5
>> 8 2001     c     6
>> 9 2002     c     0
>>
>> It seems to do a good job of guessing what you want whereas the  
>> reshape function in my hands is very failure prone (... yes, the  
>> failings are mine.)
>> -- 
>> David
>>>
>>> Cheers,
>>>
>>> Marius
>>>
>>>
>>> On 2011-06-22, at 14:40 , Dennis Murphy wrote:
>>>
>>>> I saw it as an xtabs object - I didn't think to check whether it  
>>>> was
>>>> also a matrix object. Thanks for the clarification, David.
>>>>
>>>> Dennis
>>>>
>>>> On Wed, Jun 22, 2011 at 4:59 AM, David Winsemius <dwinsemius at comcast.net 
>>>> > wrote:
>>>>>
>>>>> On Jun 21, 2011, at 6:51 PM, Dennis Murphy wrote:
>>>>>
>>>>>> Ahhh...you want a matrix. xtabs() doesn't easily allow coercion  
>>>>>> to a
>>>>>> matrix object, so try this instead:
>>>>>
>>>>> What am I missing? A contingency table already inherits from  
>>>>> matrix-class
>>>>> and if you insisted on coercion it  appears simple:
>>>>>
>>>>>> xtb <- xtabs(value ~ year + block, data = df)
>>>>>> is.matrix(xtb)
>>>>> [1] TRUE
>>>>>> as.matrix(xtb)
>>>>>   block
>>>>> year   a b c
>>>>> 2000 1 0 5
>>>>> 2001 2 4 6
>>>>> 2002 3 0 0
>>>>>
>>>>> --
>>>>> David.
>>>>>
>>>>>>
>>>>>> library(reshape)
>>>>>> as.matrix(cast(df, year ~ block, fill = 0))
>>>>>> a b c
>>>>>> 2000 1 0 5
>>>>>> 2001 2 4 6
>>>>>> 2002 3 0 0
>>>>>>
>>>>>> Hopefully this is more helpful...
>>>>>> Dennis
>>>>>>
>>>>>> On Tue, Jun 21, 2011 at 3:35 PM, Dennis Murphy  
>>>>>> <djmuser at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> xtabs(value ~ year + block, data = df)
>>>>>>>  block
>>>>>>> year   a b c
>>>>>>> 2000 1 0 5
>>>>>>> 2001 2 4 6
>>>>>>> 2002 3 0 0
>>>>>>>
>>>>>>> HTH,
>>>>>>> Dennis
>>>>>>>
>>>>>>> On Tue, Jun 21, 2011 at 3:13 PM, Marius Hofert  
>>>>>>> <m_hofert at web.de> wrote:
>>>>>>>>
>>>>>>>> Dear expeRts,
>>>>>>>>
>>>>>>>> In the minimal example below, I have a data.frame containing  
>>>>>>>> three
>>>>>>>> "blocks" of years
>>>>>>>> (the years are subsets of 2000 to 2002). For each year and  
>>>>>>>> block a
>>>>>>>> certain "value" is given.
>>>>>>>> I would like to create a matrix that has row names given by  
>>>>>>>> all years
>>>>>>>> ("2000", "2001", "2002"),
>>>>>>>> and column names given by all blocks ("a", "b", "c"); the  
>>>>>>>> entries are
>>>>>>>> then given by the
>>>>>>>> corresponding value or zero if not year-block combination  
>>>>>>>> exists.
>>>>>>>>
>>>>>>>> What's a short way to achieve this?
>>>>>>>>
>>>>>>>> Of course one can setup a matrix and use for loops (see  
>>>>>>>> below)... but
>>>>>>>> that's not nice.
>>>>>>>> The problem is that the years are not running from 2000 to  
>>>>>>>> 2002 for all
>>>>>>>> three "blocks"
>>>>>>>> (the second block only has year 2001, the third one has only  
>>>>>>>> 2000 and
>>>>>>>> 2001).
>>>>>>>> In principle, table() nicely solves such a problem (see  
>>>>>>>> below) and fills
>>>>>>>> in zeros.
>>>>>>>> This is what I would like in the end, but all non-zero  
>>>>>>>> entries should be
>>>>>>>> given by df$value,
>>>>>>>> not (as table() does) by their counts.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Marius
>>>>>>>>
>>>>>>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>>>>>>>              block=c("a","a","a","b","c","c"), value=1:6))
>>>>>>>> table(df[,1:2]) # complements the years and fills in 0
>>>>>>>>
>>>>>>>> year <- c(2000, 2001, 2002)
>>>>>>>> block <- c("a", "b", "c")
>>>>>>>> res <- matrix(0, nrow=3, ncol=3, dimnames=list(year, block))
>>>>>>>> for(i in 1:3){ # year
>>>>>>>> for(j in 1:3){ # block
>>>>>>>>    for(k in 1:nrow(df)){
>>>>>>>>        if(df[k,"year"]==year[i] && df[k,"block"]==block[j])  
>>>>>>>> res[i,j]
>>>>>>>> <- df[k,"value"]
>>>>>>>>    }
>>>>>>>> }
>>>>>>>> }
>>>>>>>> res # does the job; but seems complicated
>>>>>>
>>>>>
>>>>>
>>>>> David Winsemius, MD
>>>>> West Hartford, CT
>>>>>
>>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list