[R] calculating "treatment effects" (differences) in a data frame?

derek eder derek.eder at lungall.gu.se
Mon May 24 22:28:56 CEST 2010


I am trying to  calculating the treatment effect for individual subjects 
("ID")
of a ("score") between 2 time-points ("visit") (see example below).

The data is in an unbalanced data.frame in "long" format with some 
missing data.

I suspect that I am overlooking a very simple function, something along 
the lines of
tapply().

Thank you for you attention!


Derek Eder



##  Examples:

myData = data.frame(
   ID = c("a","a","b","c","c","d","d"),
   visit=c(1,2,1,1,2,1,2),
   score=c(10,2,12,16,0,NA,5)
   )

 > myData
   ID visit score
1  a     1    10
2  a     2     2
3  b     1    12
4  c     1    16
5  c     2     0
6  d     1    NA
7  d     2     5

# The desired result is a vector of time differences by ID
#  a  b  c  d
#  8  NA 16 NA



##  solutions ?

# This works, but the returned data frame is awkward for me
# because the "empty cells" (b and d) contain integer(0)
# and not the more familiar NA.

 > aggregate(data=myData, score~ID,FUN=diff)
   ID score
1  a    -8
2  b
3  c   -16
4  d


# This works as desired ... but somehow seems unecessarily complicated

 > reshape(data=myData,timevar="visit",idvar="ID", direction="wide")
   ID score.1 score.2
1  a      10       2
3  b      12      NA
4  c      16       0
6  d      NA       5

 > apply(X = reshape(data=myData,timevar="visit",idvar="ID", 
direction="wide")[,-1],
       MARGIN = 1, FUN = diff)

   1   3   4   6
  -8  NA -16  NA



More information about the R-help mailing list