[R] Tricky (?) conversion from data.frame to matrix where not all pairs exist

David Winsemius dwinsemius at comcast.net
Wed Jun 22 13:59:53 CEST 2011


On Jun 21, 2011, at 6:51 PM, Dennis Murphy wrote:

> Ahhh...you want a matrix. xtabs() doesn't easily allow coercion to a
> matrix object, so try this instead:

What am I missing? A contingency table already inherits from matrix- 
class and if you insisted on coercion it  appears simple:

 > xtb <- xtabs(value ~ year + block, data = df)
 > is.matrix(xtb)
[1] TRUE
 > as.matrix(xtb)
       block
year   a b c
   2000 1 0 5
   2001 2 4 6
   2002 3 0 0

-- 
David.

>
> library(reshape)
> as.matrix(cast(df, year ~ block, fill = 0))
>     a b c
> 2000 1 0 5
> 2001 2 4 6
> 2002 3 0 0
>
> Hopefully this is more helpful...
> Dennis
>
> On Tue, Jun 21, 2011 at 3:35 PM, Dennis Murphy <djmuser at gmail.com>  
> wrote:
>> Hi:
>>
>> xtabs(value ~ year + block, data = df)
>>      block
>> year   a b c
>>  2000 1 0 5
>>  2001 2 4 6
>>  2002 3 0 0
>>
>> HTH,
>> Dennis
>>
>> On Tue, Jun 21, 2011 at 3:13 PM, Marius Hofert <m_hofert at web.de>  
>> wrote:
>>> Dear expeRts,
>>>
>>> In the minimal example below, I have a data.frame containing three  
>>> "blocks" of years
>>> (the years are subsets of 2000 to 2002). For each year and block a  
>>> certain "value" is given.
>>> I would like to create a matrix that has row names given by all  
>>> years ("2000", "2001", "2002"),
>>> and column names given by all blocks ("a", "b", "c"); the entries  
>>> are then given by the
>>> corresponding value or zero if not year-block combination exists.
>>>
>>> What's a short way to achieve this?
>>>
>>> Of course one can setup a matrix and use for loops (see below)...  
>>> but that's not nice.
>>> The problem is that the years are not running from 2000 to 2002  
>>> for all three "blocks"
>>> (the second block only has year 2001, the third one has only 2000  
>>> and 2001).
>>> In principle, table() nicely solves such a problem (see below) and  
>>> fills in zeros.
>>> This is what I would like in the end, but all non-zero entries  
>>> should be given by df$value,
>>> not (as table() does) by their counts.
>>>
>>> Cheers,
>>>
>>> Marius
>>>
>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>>                  block=c("a","a","a","b","c","c"), value=1:6))
>>> table(df[,1:2]) # complements the years and fills in 0
>>>
>>> year <- c(2000, 2001, 2002)
>>> block <- c("a", "b", "c")
>>> res <- matrix(0, nrow=3, ncol=3, dimnames=list(year, block))
>>> for(i in 1:3){ # year
>>>    for(j in 1:3){ # block
>>>        for(k in 1:nrow(df)){
>>>            if(df[k,"year"]==year[i] && df[k,"block"]==block[j])  
>>> res[i,j] <- df[k,"value"]
>>>        }
>>>    }
>>> }
>>> res # does the job; but seems complicated
>


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list