[R] R function to convert person-level observations to person-period observations

Muhuri, Pradip (SAMHSA/CBHSQ) Pradip.Muhuri at samhsa.hhs.gov
Sat Jan 3 14:20:34 CET 2015


Hello,

I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm).  Please see my reproducible example below.  The function (PLPP) in the R script takes five arguments.


1)  data (i.e., the data set to be converted)

2)  id (i.e., the identifier for each observation)

3)  period (i.e., number pf periods the person or observation was followed-up)

4)  event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting).

5)  direction which "indicates whether the function should go from person-level to person-period or from person-period to person-level".
On my example data set, the R script ran successfully.  Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results.   But the issue is that the value of the "dead" indicator variable is incorrect.  I have a gut feeling that the function needs to tweaked a bit to get desired results.


Correct results
  ID dead   studyyrs
1  A    1        2
2  B    0        5
3  C    1        3

Incorrect results - the "dead" column

   ID dead    studyyrs

1   A    0        1

2   A    0        2

3   B    0        1

4   B    0        2

5   B    0        3

6   B    0        4

7   B    1        5

8   C    0        1

9   C    0        2

10  C    0        3




Desired results

   ID dead    studyyrs

1   A    0        1

2   A    1        2

3   B    0        1

4   B    0        2

5   B    0        3

6   B    0        4

7   B    0        5

8   C    0        1

9   C    0        2

10  C    1        3


I would appreciate receiving your help or hints for resolving the issue.  Thanks,



##  Below is my reproducible code is shown below)

## Below is my data frame (3 observations)
df <- data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) )
df

## Person-Level Person-Period Converter Function - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm
PLPP <- function(data, id, period, event, direction = c("period", "level")) {
  ## Data Checking and Verification Steps
  stopifnot(is.matrix(data) || is.data.frame(data))
  stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data)))

  if (any(is.na(data[, c(id, period, event)]))) {
    stop("PLPP cannot currently handle missing data in the id, period, or event variables")
  }

  ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm
  switch(match.arg(direction),
         period = {
           index <- rep(1:nrow(data), data[, period])
           idmax <- cumsum(data[, period])
           reve <- !data[, event]
           dat <- data[index, ]
           dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along)
           dat[, event] <- 0
           dat[idmax, event] <- reve},
         level = {
           tmp <- cbind(data[, c(period, id)], i = 1:nrow(data))
           index <- as.vector(by(tmp, tmp[, id],
                                 FUN = function(x) x[which.max(x[, period]), "i"]))
           dat <- data[index, ]
           dat[, event] <- as.integer(!dat[, event])
         })

  rownames(dat) <- NULL
  return(dat)
}

tpp <- PLPP(data = df, id = "ID", period = "studyyrs",
            event = "dead", direction = "period")
tpp



Pradip K. Muhuri,
SAMHSA/CBHSQ


	[[alternative HTML version deleted]]



More information about the R-help mailing list