[R] Count number of change in a specified time interval
Bert Gunter
gunter.berton at gene.com
Mon Aug 4 17:39:12 CEST 2014
Or ?rle
Bert
Sent from my iPhone -- please excuse typos.
> On Aug 4, 2014, at 8:28 AM, jim holtman <jholtman at gmail.com> wrote:
>
> Try this, but I only get 2 changes for CB27A instead of you indicated 3:
>
>> require(data.table)
>> x <- read.table(text = "CASE_ID YEAR_MTH ATT_1
> + CB26A 201302 1
> + CB26A 201302 0
> + CB26A 201302 0
> + CB26A 201303 1
> + CB26A 201303 1
> + CB26A 201304 0
> + CB26A 201305 1
> + CB26A 201305 0
> + CB26A 201306 1
> + CB27A 201304 0
> + CB27A 201304 0
> + CB27A 201305 1
> + CB27A 201306 1
> + CB27A 201306 0
> + CB27A 201307 0
> + CB27A 201308 1", header = TRUE, as.is = TRUE)
>> setDT(x)
>> # convert to a Date object for comparison
>> x[, MYD := as.Date(paste0(YEAR_MTH, '01'), format = "%Y%m%d")]
>> # separate by CASE_ID and only keep the first 3 months
>> x[
> + , {
> + # determine the end date as 3 months from the first date
> + endDate <- seq(MYD[1L], by = '3 months', length = 2)[2L]
> + # extract what is changing
> + changes <- ATT_1[(MYD >= MYD[1L]) & (MYD <= endDate)]
> + # now count the changes
> + list(nChanges = sum(head(changes, -1L) != tail(changes, -1L)))
> + }
> + , by = CASE_ID
> + ]
> CASE_ID nChanges
> 1: CB26A 5
> 2: CB27A 2
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
>> On Wed, Jul 30, 2014 at 3:08 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
>> Dear R-helpers,
>>
>> I want to count the number of times ATT_1 has changed in a period of 3
>> months(can be 4months) from the first YEAR_MTH entry for a CASE_ID. So if
>> for a CASE_ID we have data only for two distinct YEAR_MTH, then all the
>> entries should be considered, otherwise only the relevant entries will be
>> considered for calculation.
>> E.g. if the first YEAR_MTH entry is 201304 then get the number of changes
>> till 201307(inclusive), similarly if the first YEAR_MTH entry is 201302
>> then get the number of changes till 201305.
>>
>> Dataset
>> CASE_ID YEAR_MTH ATT_1
>> CB26A 201302 1
>> CB26A 201302 0
>> CB26A 201302 0
>> CB26A 201303 1
>> CB26A 201303 1
>> CB26A 201304 0
>> CB26A 201305 1
>> CB26A 201305 0
>> CB26A 201306 1
>> CB27A 201304 0
>> CB27A 201304 0
>> CB27A 201305 1
>> CB27A 201306 1
>> CB27A 201306 0
>> CB27A 201307 0
>> CB27A 201308 1
>>
>> The final dataset should look like
>>
>> ID_CASE No.of changes
>> CB26A 5
>> CB27A 3
>>
>> where 'No.of changes' refer to the change in 3 months (201302-201305 for
>> CB26A and 201304-201307 for CB27A).
>>
>> How can this be done in R?
>>
>> Regards,
>> Abhinaba Roy
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list