[R] Selecting one row or multiple rows per ID
Dieter Menne
dieter.menne at menne-biomed.de
Wed Mar 4 10:09:32 CET 2009
Vedula, Satyanarayana <svedula <at> jhsph.edu> writes:
>
>
> I need to select one row per patient i in clinic j. The data is organized
> similar to that shown below.
>
...
> If patient has outcome recorded at visit 2, then outcome = outcome
>columns at visit 2
> If patient does not have visit 2, then outcome = outcome at visit 5
> If patient does not have visit 2 and visit 5, then outcome = outcome at
> visit ... other rules
I prefer to use a table driven approach here, because one can easily
get lost in all these if's, and medical research requires well defined
documentation of the outcome you choose.
So I first convert the data to the wide format; you might alternatively
use function cast in package reshape for this, but I never can get the
syntax right. I also prefer to do most of this preparatory work on the
database level, e.g. with PIVOT.
Create a translation table of the 25 possible combinations to the
column you selected, and you can be sure you forgot no combination.
Dieter
outc = data.frame(
patclin = as.factor(
paste(c(1,1,1,1,3,3,3,3),
c(1,3,3,3,5,5,5,5),sep=".")),
vis = as.factor(c(2,1,2,3,1,3,4,5)),
outcom = c(22,21,21,20,24,21,22,22))
outw = reshape(outc,v.names="outcom",idvar="patclin",timevar="vis",
direction="wide")
outw = outw[,order(names(outw))]
# I am sure there is a more elegant way to do this
# I prefer to do this type of work on the database level
outw$code= as.factor(
apply(sapply(outw[,1:5],function(x){as.integer(!is.na(x))}),1,paste,
collapse=""))
# Note : the values here are not exactly what you requeste,
# use your logic to select columns here
usevisit = data.frame(code=levels(outw$code),visit=c(2,3,4))
outw = merge(usevisit,outw)
outw
# you get a documented table of the columns you selected and
# can use visit to select the column
# code visit outcom.1 outcom.2 outcom.3 outcom.4 outcom.5 patclin
#1 01000 2 NA 22 NA NA NA 1.1
#2 10111 3 24 NA 21 22 22 3.5
#3 11100 4 21 21 20 NA NA 1.3
More information about the R-help
mailing list