[R] data manipulation
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Sun Sep 7 22:34:18 CEST 2003
Ricardo Pietrobon <rpietro at duke.edu> writes:
> ID date cost
> 1 "2001-01" 200.00
> 1 "2001-01" 123.94
> 1 "2001-03" 100.23
> 1 "2001-04" 150.34
> 2 "2001-03" 296.34
> 2 "2002-05" 156.36
>
>
> I would like to obtain the median costs and boxplots for the sum of
> encounters happening in the first six months after the index encounter
> (first patient encounter) for each patient, then the mean and median costs
> for the costs happening from 6 to 12 months after the index encounter, and
> so on. Notice that the first ID has two encounters during the index date,
> making it more difficult to define a single row with the index encounter.
>
> Any help would be appreciated,
Let's see... You're going to need a bit of slight ugliness to convert
the date to a numeric month number. Something like (NB: That's a code
that means "I didn't actually try this"...)
attach(yourdata)
monthnum <- sapply(strsplit(date,"-"),function(x)sum(as.numeric(x)*c(12,1)))
Then we need a table of the index dates for each person
tbl <- tapply(monthnum, ID, min)
Now subtract the index date from monthnum
months.post.index <- monthnum - tbl[ID]
then you probably want to look at the subset of your original data
frame and do the sums
total.cost.6mo <- with(subset(yourdata,months.post.index < 6),
tapply(cost,ID,sum))
and finally
boxplot(total.cost.6mo)
median(total.cost.6mo)
(You could elaborate by converting months.post.index with cut() and
use lapply(names(period),.....) to give you a list of tables, which
boxplot() might actually know how to plot directly.)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list