[R] placing multiple rows in a single row

Annemarie Verkerk annemarie.verkerk at mpi.nl
Tue Jul 5 09:00:18 CEST 2011


Dear David,

thanks so much, I was able to get it to work for my data! I don't really 
understand yet how the function works, but it seems extremely useful.

Thanks again!
Annemarie

David Winsemius wrote:
>
> On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote:
>
>> Dear people from the R help list,
>>
>> I have a question that I can't get my head around to start answering, 
>> that is why I am writing to the list.
>>
>> I have data in a format like this (tabs might look weird):
>>
>> John     A1     1     0     1
>> John     A2     1    1    1
>> John     A3     1    0    0
>> Mary    A1     1     0     1
>> Mary     A2     0    0    1
>> Mary     A3    1    1    0
>> Peter     A1     1    0    0
>> Peter     A2     0    0    1
>> Peter     A3     1    1    1
>> Josh     A1     1     0    0
>> Josh     A2
>> Josh     A3    0    0    0
>>
>> I want to convert it into a format where variable rows from a single 
>> subject are placed behind each other, but with the different scores 
>> still matching up (i.e., it needs to be able to cope with missing 
>> data, as for Josh's A2 score).
>>
>> John     A1     1     0     1     A2     1    1    1     A3     1    
>> 0    0
>> Mary    A1     1     0     1    A2     0    0    1     A3    1    1    0
>> Peter     A1     1    0    0     A2     0    0    1     A3     1    
>> 1    1
>> Josh     A1     1     0    0      A2                A3    0    0    0
>>
>> Preferably, the row identification would become the header of the new 
>> table, something like this:
>>
>>       A11    A12    A13 A21    A22    A23    A31    A32    A33
>> John      1     0     1      1    1    1      1    0    0
>> Mary     1     0     1     0    0    1     1    1    0
>> Peter      1    0    0      0    0    1      1    1    1
>> Josh      1     0    0                  0    0    0
>>
>> Probably, this has been addressed before - I just don't know how to 
>> search for the answer with the right search terms.
>>
>> Any help is appreciated, even just a link to a page where this is 
>> addressed!
>
> There is a reshape function in the stats package that nobody except 
> Phil Spector seems to understand and then there is the reshape and 
> reshape2 packages that everybody seems to get. (I don't understand why 
> the classification variables are on the left-hand-side, though. 
> Positionally it makes some sense, but logically it does not connect 
> with how I understand the process.)
>
> require(reshape2)
> # entered your data with default names V1 V2 V3 V4 V5
> > nam123
>       V1 V2 V3 V4 V5
> 1   John A1  1  0  1
> 2   John A2  1  1  1
> 3   John A3  1  0  0
> 4   Mary A1  1  0  1
> 5   Mary A2  0  0  1
> 6   Mary A3  1  1  0
> 7  Peter A1  1  0  0
> 8  Peter A2  0  0  1
> 9  Peter A3  1  1  1
> 10  Josh A1  1  0  0
> 11  Josh A2 NA NA NA
> 12  Josh A3  0  0  0
>
> > nams.mlt <- melt(nam123, idvars=c("V1", "V2"))
>
> > str(nams.mlt)
> 'data.frame':    36 obs. of  4 variables:
>  $ V1      : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4 2 ...
>  $ V2      : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 1 ...
>  $ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 1 ...
>  $ value   : int  1 1 1 1 0 1 1 0 1 1 ...
>
> > dcast(nams.mlt, V1+V2 ~ variable)
>       V1 V2 V3 V4 V5
> 1   John A1  1  0  1
> 2   John A2  1  1  1
> 3   John A3  1  0  0
> 4   Josh A1  1  0  0
> 5   Josh A2 NA NA NA
> 6   Josh A3  0  0  0
> 7   Mary A1  1  0  1
> 8   Mary A2  0  0  1
> 9   Mary A3  1  1  0
> 10 Peter A1  1  0  0
> 11 Peter A2  0  0  1
> 12 Peter A3  1  1  1
> > dcast(nams.mlt, V1 ~ V2+variable)
>      V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5
> 1  John     1     0     1     1     1     1     1     0     0
> 2  Josh     1     0     0    NA    NA    NA     0     0     0
> 3  Mary     1     0     1     0     0     1     1     1     0
> 4 Peter     1     0     0     0     0     1     1     1     1
>
> You can always change the names of the dataframe if you want, and in 
> this case it would be a simple sub() operation. Personally I would 
> substitute "." rather than "".

-- 
Annemarie Verkerk, MA
Evolutionary Processes in Language and Culture (PhD student)
Max Planck Institute for Psycholinguistics
P.O. Box 310, 6500AH Nijmegen, The Netherlands
+31 (0)24 3521 185
http://www.mpi.nl/research/research-projects/evolutionary-processes



More information about the R-help mailing list