[R] Dropping "trailing zeroes" in longitudinal data

Mon Apr 26 23:40:23 CEST 2010

Bill--

Awesome; the code is perfect (for what we need).

Thank you so much for your help.

cheers, Dave

Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)		
1100 NE 45th Street, Suite 300 	
Seattle, WA  98105 	
206-616-3879 	
http://depts.washington.edu/cshrb/
(Mon-Wed)	

Center for Healthcare Improvement, for Addictions, Mental Illness,
   Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104?
206-897-4210
http://www.chammp.org
(Thurs)

William Dunlap wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of David Atkins
>> Sent: Monday, April 26, 2010 12:23 PM
>> To: r-help at r-project.org
>> Subject: [R] Dropping "trailing zeroes" in longitudinal data
>>
>>
>> Background: Our research group collected data from students 
>> via the web 
>> about their drinking habits (alcohol) over the last 90 days.  As you 
>> might guess, some students seem to have lost interest and 
>> completed some 
>> information but not all.  Unfortunately, the survey was programmed to 
>> "pre-populate" the fields with zeroes (to make it easier for 
>> students to 
>> complete).
>>
>> Obviously, when we see a stretch of zeroes, we've no idea 
>> whether this 
>> is "true" data or not, but we'd like to at least do some sensitivity 
>> analyses by dropping "trailing zeroes" (ie, when there are non-zero 
>> responses for some duration of the data that then "flat line" 
>> into all 
>> zeroes to the end of the time period)
>>
>> I've included a toy dataset below.
>>
>> Basically, we have the data in the "long" format, and what 
>> I'd like to 
>> do is subset the data.frame by deleting rows that occur at 
>> the end of a 
>> person's data that are all zeroes.  In a nutshell, select rows from a 
>> person that are continuously zero, up to first non-zero, 
>> starting at the 
>> end of their data (which, below, would be time = 10).
>>
>> With the toy data, this would be the last 6 rows of ids #10 
>> and #8 (for 
>> example).  I can begin to think about how I might do this via 
>> grep/regexp but am a bit stumped about how to translate that to this 
>> type of data.
>>
>> Any thoughts appreciated.
>>
>> cheers, Dave
>>
>> ### toy dataset
>> set.seed(123)
>> toy.df <- data.frame(id = factor(rep(1:10, each=10)),
>> 						time = rep(1:10, 10),
>> 					   dv = rnbinom(100, mu 
>> = 0.5, size = 100))
>> toy.df
>>
>> library(lattice)
>>
>> xyplot(dv ~ time | id, data = toy.df, type = c("g","l"))
> 
> Try using rle (run length encoding) along with either ave()
> or lapply().  E.g., define the function
> 
> isInTrailingRunOfZeroes <- function (x, group, minRunLength = 1) {
>     as.logical(ave(x, group, FUN = function(x) {
>         r <- rle(x)
>         n <- length(r$values)
>         if (n == 0) {
>             logical(0)
>         } else if (r$values[n] == 0 && r$lengths[n] >= minRunLength) {
>             rep(c(FALSE, TRUE), c(sum(r$lengths[-n]), r$lengths[n]))
>         } else {
>             rep(FALSE, sum(r$lengths))
>         }
>     }))
> }
> 
> and use it to drop the trailing runs of 0's with
>     xyplot(data=toy.df[!isInTrailingRunOfZeroes(toy.df$dv, toy.df$id),],
>            dv~time|id, type=c("g","l"))
> or replace them with NA's with
>     toy.df.copy <- toy.df
>     toy.df.copy[isInTrailingRunOfZeroes(toy.df.copy$dv,
> toy.df.copy$id),"dv"] <- NA
> 
> The last argument, minRunLength lets you say you only want
> to consider the data spurious if there are at least that many
> zeroes.
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
>> -- 
>> Dave Atkins, PhD
>> Research Associate Professor
>> Department of Psychiatry and Behavioral Science
>> University of Washington
>> datkins at u.washington.edu
>>
>> Center for the Study of Health and Risk Behaviors (CSHRB)		
>> 1100 NE 45th Street, Suite 300 	
>> Seattle, WA  98105 	
>> 206-616-3879 	
>> http://depts.washington.edu/cshrb/
>> (Mon-Wed)	
>>
>> Center for Healthcare Improvement, for Addictions, Mental Illness,
>>    Medically Vulnerable Populations (CHAMMP)
>> 325 9th Avenue, 2HH-15
>> Box 359911
>> Seattle, WA 98104?
>> 206-897-4210
>> http://www.chammp.org
>> (Thurs)
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>