[R] R function to convert person-level observations to person-period observations
Muhuri, Pradip (SAMHSA/CBHSQ)
Pradip.Muhuri at samhsa.hhs.gov
Sat Jan 3 14:20:34 CET 2015
Hello,
I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm). Please see my reproducible example below. The function (PLPP) in the R script takes five arguments.
1) data (i.e., the data set to be converted)
2) id (i.e., the identifier for each observation)
3) period (i.e., number pf periods the person or observation was followed-up)
4) event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting).
5) direction which "indicates whether the function should go from person-level to person-period or from person-period to person-level".
On my example data set, the R script ran successfully. Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results. But the issue is that the value of the "dead" indicator variable is incorrect. I have a gut feeling that the function needs to tweaked a bit to get desired results.
Correct results
ID dead studyyrs
1 A 1 2
2 B 0 5
3 C 1 3
Incorrect results - the "dead" column
ID dead studyyrs
1 A 0 1
2 A 0 2
3 B 0 1
4 B 0 2
5 B 0 3
6 B 0 4
7 B 1 5
8 C 0 1
9 C 0 2
10 C 0 3
Desired results
ID dead studyyrs
1 A 0 1
2 A 1 2
3 B 0 1
4 B 0 2
5 B 0 3
6 B 0 4
7 B 0 5
8 C 0 1
9 C 0 2
10 C 1 3
I would appreciate receiving your help or hints for resolving the issue. Thanks,
## Below is my reproducible code is shown below)
## Below is my data frame (3 observations)
df <- data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) )
df
## Person-Level Person-Period Converter Function - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm
PLPP <- function(data, id, period, event, direction = c("period", "level")) {
## Data Checking and Verification Steps
stopifnot(is.matrix(data) || is.data.frame(data))
stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data)))
if (any(is.na(data[, c(id, period, event)]))) {
stop("PLPP cannot currently handle missing data in the id, period, or event variables")
}
## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm
switch(match.arg(direction),
period = {
index <- rep(1:nrow(data), data[, period])
idmax <- cumsum(data[, period])
reve <- !data[, event]
dat <- data[index, ]
dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along)
dat[, event] <- 0
dat[idmax, event] <- reve},
level = {
tmp <- cbind(data[, c(period, id)], i = 1:nrow(data))
index <- as.vector(by(tmp, tmp[, id],
FUN = function(x) x[which.max(x[, period]), "i"]))
dat <- data[index, ]
dat[, event] <- as.integer(!dat[, event])
})
rownames(dat) <- NULL
return(dat)
}
tpp <- PLPP(data = df, id = "ID", period = "studyyrs",
event = "dead", direction = "period")
tpp
Pradip K. Muhuri,
SAMHSA/CBHSQ
[[alternative HTML version deleted]]
More information about the R-help
mailing list