[R] placing multiple rows in a single row
Annemarie Verkerk
annemarie.verkerk at mpi.nl
Tue Jul 5 09:00:18 CEST 2011
Dear David,
thanks so much, I was able to get it to work for my data! I don't really
understand yet how the function works, but it seems extremely useful.
Thanks again!
Annemarie
David Winsemius wrote:
>
> On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote:
>
>> Dear people from the R help list,
>>
>> I have a question that I can't get my head around to start answering,
>> that is why I am writing to the list.
>>
>> I have data in a format like this (tabs might look weird):
>>
>> John A1 1 0 1
>> John A2 1 1 1
>> John A3 1 0 0
>> Mary A1 1 0 1
>> Mary A2 0 0 1
>> Mary A3 1 1 0
>> Peter A1 1 0 0
>> Peter A2 0 0 1
>> Peter A3 1 1 1
>> Josh A1 1 0 0
>> Josh A2
>> Josh A3 0 0 0
>>
>> I want to convert it into a format where variable rows from a single
>> subject are placed behind each other, but with the different scores
>> still matching up (i.e., it needs to be able to cope with missing
>> data, as for Josh's A2 score).
>>
>> John A1 1 0 1 A2 1 1 1 A3 1
>> 0 0
>> Mary A1 1 0 1 A2 0 0 1 A3 1 1 0
>> Peter A1 1 0 0 A2 0 0 1 A3 1
>> 1 1
>> Josh A1 1 0 0 A2 A3 0 0 0
>>
>> Preferably, the row identification would become the header of the new
>> table, something like this:
>>
>> A11 A12 A13 A21 A22 A23 A31 A32 A33
>> John 1 0 1 1 1 1 1 0 0
>> Mary 1 0 1 0 0 1 1 1 0
>> Peter 1 0 0 0 0 1 1 1 1
>> Josh 1 0 0 0 0 0
>>
>> Probably, this has been addressed before - I just don't know how to
>> search for the answer with the right search terms.
>>
>> Any help is appreciated, even just a link to a page where this is
>> addressed!
>
> There is a reshape function in the stats package that nobody except
> Phil Spector seems to understand and then there is the reshape and
> reshape2 packages that everybody seems to get. (I don't understand why
> the classification variables are on the left-hand-side, though.
> Positionally it makes some sense, but logically it does not connect
> with how I understand the process.)
>
> require(reshape2)
> # entered your data with default names V1 V2 V3 V4 V5
> > nam123
> V1 V2 V3 V4 V5
> 1 John A1 1 0 1
> 2 John A2 1 1 1
> 3 John A3 1 0 0
> 4 Mary A1 1 0 1
> 5 Mary A2 0 0 1
> 6 Mary A3 1 1 0
> 7 Peter A1 1 0 0
> 8 Peter A2 0 0 1
> 9 Peter A3 1 1 1
> 10 Josh A1 1 0 0
> 11 Josh A2 NA NA NA
> 12 Josh A3 0 0 0
>
> > nams.mlt <- melt(nam123, idvars=c("V1", "V2"))
>
> > str(nams.mlt)
> 'data.frame': 36 obs. of 4 variables:
> $ V1 : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4 2 ...
> $ V2 : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 1 ...
> $ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 1 ...
> $ value : int 1 1 1 1 0 1 1 0 1 1 ...
>
> > dcast(nams.mlt, V1+V2 ~ variable)
> V1 V2 V3 V4 V5
> 1 John A1 1 0 1
> 2 John A2 1 1 1
> 3 John A3 1 0 0
> 4 Josh A1 1 0 0
> 5 Josh A2 NA NA NA
> 6 Josh A3 0 0 0
> 7 Mary A1 1 0 1
> 8 Mary A2 0 0 1
> 9 Mary A3 1 1 0
> 10 Peter A1 1 0 0
> 11 Peter A2 0 0 1
> 12 Peter A3 1 1 1
> > dcast(nams.mlt, V1 ~ V2+variable)
> V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5
> 1 John 1 0 1 1 1 1 1 0 0
> 2 Josh 1 0 0 NA NA NA 0 0 0
> 3 Mary 1 0 1 0 0 1 1 1 0
> 4 Peter 1 0 0 0 0 1 1 1 1
>
> You can always change the names of the dataframe if you want, and in
> this case it would be a simple sub() operation. Personally I would
> substitute "." rather than "".
--
Annemarie Verkerk, MA
Evolutionary Processes in Language and Culture (PhD student)
Max Planck Institute for Psycholinguistics
P.O. Box 310, 6500AH Nijmegen, The Netherlands
+31 (0)24 3521 185
http://www.mpi.nl/research/research-projects/evolutionary-processes
More information about the R-help
mailing list