[R] Count number of change in a specified time interval

jim holtman jholtman at gmail.com
Mon Aug 4 18:06:20 CEST 2014


Here is the solution using 'rle':

> require(data.table)
> x <- read.table(text = "CASE_ID YEAR_MTH ATT_1
+  CB26A    201302         1
+  CB26A    201302         0
+  CB26A    201302         0
+  CB26A    201303         1
+  CB26A    201303         1
+  CB26A    201304         0
+  CB26A    201305         1
+  CB26A    201305         0
+  CB26A    201306         1
+  CB27A    201304         0
+  CB27A    201304         0
+  CB27A    201305         1
+  CB27A    201306         1
+  CB27A    201306         0
+  CB27A    201307         0
+  CB27A    201308         1", header = TRUE, as.is = TRUE)
> setDT(x)
> # convert to a Date object for comparison
> x[, MYD := as.Date(paste0(YEAR_MTH, '01'), format = "%Y%m%d")]
> # separate by CASE_ID and only keep the first 3 months
> x[
+      , {
+          # determine the end date as 3 months from the first date
+          endDate <- seq(MYD[1L], by = '3 months', length = 2)[2L]
+          # now count the changes
+          list(nChanges = length(rle(ATT_1[(MYD >= MYD[1L]) & (MYD <=
endDate)])[[1L]]) - 1L)
+        }
+      , by = CASE_ID
+      ]
   CASE_ID nChanges
1:   CB26A        5
2:   CB27A        2

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Aug 4, 2014 at 11:39 AM, Bert Gunter <gunter.berton at gene.com> wrote:
> Or ?rle
>
> Bert
>
>
>
> Sent from my iPhone -- please excuse typos.
>
>> On Aug 4, 2014, at 8:28 AM, jim holtman <jholtman at gmail.com> wrote:
>>
>> Try this, but I only get 2 changes for CB27A instead of you indicated 3:
>>
>>> require(data.table)
>>> x <- read.table(text = "CASE_ID YEAR_MTH ATT_1
>> + CB26A    201302         1
>> + CB26A    201302         0
>> + CB26A    201302         0
>> + CB26A    201303         1
>> + CB26A    201303         1
>> + CB26A    201304         0
>> + CB26A    201305         1
>> + CB26A    201305         0
>> + CB26A    201306         1
>> + CB27A    201304         0
>> + CB27A    201304         0
>> + CB27A    201305         1
>> + CB27A    201306         1
>> + CB27A    201306         0
>> + CB27A    201307         0
>> + CB27A    201308         1", header = TRUE, as.is = TRUE)
>>> setDT(x)
>>> # convert to a Date object for comparison
>>> x[, MYD := as.Date(paste0(YEAR_MTH, '01'), format = "%Y%m%d")]
>>> # separate by CASE_ID and only keep the first 3 months
>>> x[
>> +     , {
>> +         # determine the end date as 3 months from the first date
>> +         endDate <- seq(MYD[1L], by = '3 months', length = 2)[2L]
>> +         # extract what is changing
>> +         changes <- ATT_1[(MYD >= MYD[1L]) & (MYD <= endDate)]
>> +         # now count the changes
>> +         list(nChanges = sum(head(changes, -1L) != tail(changes, -1L)))
>> +       }
>> +     , by = CASE_ID
>> +     ]
>>   CASE_ID nChanges
>> 1:   CB26A        5
>> 2:   CB27A        2
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>>> On Wed, Jul 30, 2014 at 3:08 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
>>> Dear R-helpers,
>>>
>>> I want to count the number of times ATT_1 has changed in a period of 3
>>> months(can be 4months) from the first YEAR_MTH entry for a CASE_ID. So if
>>> for a CASE_ID we have data only for two distinct YEAR_MTH, then all the
>>> entries should be considered, otherwise only the relevant entries will be
>>> considered for calculation.
>>> E.g. if the first YEAR_MTH entry is 201304 then get the number of changes
>>> till 201307(inclusive), similarly if the first YEAR_MTH entry is 201302
>>> then get the number of changes till 201305.
>>>
>>> Dataset
>>> CASE_ID YEAR_MTH ATT_1
>>> CB26A    201302         1
>>> CB26A    201302         0
>>> CB26A    201302         0
>>> CB26A    201303         1
>>> CB26A    201303         1
>>> CB26A    201304         0
>>> CB26A    201305         1
>>> CB26A    201305         0
>>> CB26A    201306         1
>>> CB27A    201304         0
>>> CB27A    201304         0
>>> CB27A    201305         1
>>> CB27A    201306         1
>>> CB27A    201306         0
>>> CB27A    201307         0
>>> CB27A    201308         1
>>>
>>> The final dataset should look like
>>>
>>> ID_CASE    No.of changes
>>> CB26A        5
>>> CB27A        3
>>>
>>> where 'No.of changes' refer to the change in 3 months (201302-201305 for
>>> CB26A and 201304-201307 for CB27A).
>>>
>>> How can this be done in R?
>>>
>>> Regards,
>>> Abhinaba Roy
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list