[R] Best way to compute the difference between two levels of a factor ?
Peter Ehlers
ehlers at ucalgary.ca
Wed Mar 21 11:03:37 CET 2012
On 2012-03-21 01:48, wphantomfr wrote:
> Dear R-help Members,
>
>
> I am wondering if anyone think of the optimal way of computing for
> several numeric variable the difference between 2 levels of a factor.
>
>
> To be clear let's generate a simple data frame with 2 numeric variables
> collected for different subjects (ID) and 2 levels of a TIME factor
> (time of evaluation)
>
> data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))
>
> ID TIME X Y
> 1 AA T1 9.959540 11.140529
> 2 AA T2 12.949522 9.896559
> 3 BB T1 9.039486 13.469104
> 4 BB T2 10.056392 14.632169
> 5 CC T1 8.706590 14.939197
> 6 CC T2 10.799296 10.747609
>
> I want to compute for each subject and each variable (X, Y, ...) the
> difference between T2 and T1.
>
> Until today I do it by reshaping my dataframe to the wide format (the
> columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the
> difference between successive columns one by one :
> data$Xdiff=data$X.T2-data$X.T1
> data$Ydiff=data$Y.T2-data$Y.T1
> ...
>
> but this way is probably not optimal if the difference has to be
> computed for a large number of variables.
>
> How will you handle it ?
One way is to use the plyr package:
library(plyr)
result <- ddply(data, "ID", summarize,
DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])
Peter Ehlers
>
>
> Thanks in advance
>
> Sylvain Clément
More information about the R-help
mailing list