[R] Count number of change in a specified time interval
jim holtman
jholtman at gmail.com
Mon Aug 4 17:28:21 CEST 2014
Try this, but I only get 2 changes for CB27A instead of you indicated 3:
> require(data.table)
> x <- read.table(text = "CASE_ID YEAR_MTH ATT_1
+ CB26A 201302 1
+ CB26A 201302 0
+ CB26A 201302 0
+ CB26A 201303 1
+ CB26A 201303 1
+ CB26A 201304 0
+ CB26A 201305 1
+ CB26A 201305 0
+ CB26A 201306 1
+ CB27A 201304 0
+ CB27A 201304 0
+ CB27A 201305 1
+ CB27A 201306 1
+ CB27A 201306 0
+ CB27A 201307 0
+ CB27A 201308 1", header = TRUE, as.is = TRUE)
> setDT(x)
> # convert to a Date object for comparison
> x[, MYD := as.Date(paste0(YEAR_MTH, '01'), format = "%Y%m%d")]
> # separate by CASE_ID and only keep the first 3 months
> x[
+ , {
+ # determine the end date as 3 months from the first date
+ endDate <- seq(MYD[1L], by = '3 months', length = 2)[2L]
+ # extract what is changing
+ changes <- ATT_1[(MYD >= MYD[1L]) & (MYD <= endDate)]
+ # now count the changes
+ list(nChanges = sum(head(changes, -1L) != tail(changes, -1L)))
+ }
+ , by = CASE_ID
+ ]
CASE_ID nChanges
1: CB26A 5
2: CB27A 2
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, Jul 30, 2014 at 3:08 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
> Dear R-helpers,
>
> I want to count the number of times ATT_1 has changed in a period of 3
> months(can be 4months) from the first YEAR_MTH entry for a CASE_ID. So if
> for a CASE_ID we have data only for two distinct YEAR_MTH, then all the
> entries should be considered, otherwise only the relevant entries will be
> considered for calculation.
> E.g. if the first YEAR_MTH entry is 201304 then get the number of changes
> till 201307(inclusive), similarly if the first YEAR_MTH entry is 201302
> then get the number of changes till 201305.
>
> Dataset
> CASE_ID YEAR_MTH ATT_1
> CB26A 201302 1
> CB26A 201302 0
> CB26A 201302 0
> CB26A 201303 1
> CB26A 201303 1
> CB26A 201304 0
> CB26A 201305 1
> CB26A 201305 0
> CB26A 201306 1
> CB27A 201304 0
> CB27A 201304 0
> CB27A 201305 1
> CB27A 201306 1
> CB27A 201306 0
> CB27A 201307 0
> CB27A 201308 1
>
> The final dataset should look like
>
> ID_CASE No.of changes
> CB26A 5
> CB27A 3
>
> where 'No.of changes' refer to the change in 3 months (201302-201305 for
> CB26A and 201304-201307 for CB27A).
>
> How can this be done in R?
>
> Regards,
> Abhinaba Roy
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list