[R] Tricky (?) conversion from data.frame to matrix where not all pairs exist

Marius Hofert m_hofert at web.de
Wed Jun 22 15:19:47 CEST 2011


Hi,

and what's the simplest way to obtain a *data.frame* with all years?
The matching seems more difficult here because the years can/will show up several times... 

(df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001), 
                 block=c("a","a","a","b","c","c"), value=1:6))
(df. <- data.frame(year=rep(2000:2002, 3), block=rep(c("a", "b", "c"), each=3), value=0))
# how to fill in the given values?

Cheers,

Marius


On 2011-06-22, at 14:40 , Dennis Murphy wrote:

> I saw it as an xtabs object - I didn't think to check whether it was
> also a matrix object. Thanks for the clarification, David.
> 
> Dennis
> 
> On Wed, Jun 22, 2011 at 4:59 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>> On Jun 21, 2011, at 6:51 PM, Dennis Murphy wrote:
>> 
>>> Ahhh...you want a matrix. xtabs() doesn't easily allow coercion to a
>>> matrix object, so try this instead:
>> 
>> What am I missing? A contingency table already inherits from matrix-class
>> and if you insisted on coercion it  appears simple:
>> 
>>> xtb <- xtabs(value ~ year + block, data = df)
>>> is.matrix(xtb)
>> [1] TRUE
>>> as.matrix(xtb)
>>      block
>> year   a b c
>>  2000 1 0 5
>>  2001 2 4 6
>>  2002 3 0 0
>> 
>> --
>> David.
>> 
>>> 
>>> library(reshape)
>>> as.matrix(cast(df, year ~ block, fill = 0))
>>>    a b c
>>> 2000 1 0 5
>>> 2001 2 4 6
>>> 2002 3 0 0
>>> 
>>> Hopefully this is more helpful...
>>> Dennis
>>> 
>>> On Tue, Jun 21, 2011 at 3:35 PM, Dennis Murphy <djmuser at gmail.com> wrote:
>>>> 
>>>> Hi:
>>>> 
>>>> xtabs(value ~ year + block, data = df)
>>>>     block
>>>> year   a b c
>>>>  2000 1 0 5
>>>>  2001 2 4 6
>>>>  2002 3 0 0
>>>> 
>>>> HTH,
>>>> Dennis
>>>> 
>>>> On Tue, Jun 21, 2011 at 3:13 PM, Marius Hofert <m_hofert at web.de> wrote:
>>>>> 
>>>>> Dear expeRts,
>>>>> 
>>>>> In the minimal example below, I have a data.frame containing three
>>>>> "blocks" of years
>>>>> (the years are subsets of 2000 to 2002). For each year and block a
>>>>> certain "value" is given.
>>>>> I would like to create a matrix that has row names given by all years
>>>>> ("2000", "2001", "2002"),
>>>>> and column names given by all blocks ("a", "b", "c"); the entries are
>>>>> then given by the
>>>>> corresponding value or zero if not year-block combination exists.
>>>>> 
>>>>> What's a short way to achieve this?
>>>>> 
>>>>> Of course one can setup a matrix and use for loops (see below)... but
>>>>> that's not nice.
>>>>> The problem is that the years are not running from 2000 to 2002 for all
>>>>> three "blocks"
>>>>> (the second block only has year 2001, the third one has only 2000 and
>>>>> 2001).
>>>>> In principle, table() nicely solves such a problem (see below) and fills
>>>>> in zeros.
>>>>> This is what I would like in the end, but all non-zero entries should be
>>>>> given by df$value,
>>>>> not (as table() does) by their counts.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Marius
>>>>> 
>>>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>>>>                 block=c("a","a","a","b","c","c"), value=1:6))
>>>>> table(df[,1:2]) # complements the years and fills in 0
>>>>> 
>>>>> year <- c(2000, 2001, 2002)
>>>>> block <- c("a", "b", "c")
>>>>> res <- matrix(0, nrow=3, ncol=3, dimnames=list(year, block))
>>>>> for(i in 1:3){ # year
>>>>>   for(j in 1:3){ # block
>>>>>       for(k in 1:nrow(df)){
>>>>>           if(df[k,"year"]==year[i] && df[k,"block"]==block[j]) res[i,j]
>>>>> <- df[k,"value"]
>>>>>       }
>>>>>   }
>>>>> }
>>>>> res # does the job; but seems complicated
>>> 
>> 
>> 
>> David Winsemius, MD
>> West Hartford, CT
>> 
>> 



More information about the R-help mailing list