[R] create a index.date column

Dennis Murphy djmuser at gmail.com
Wed Jul 27 23:28:51 CEST 2011


Hi:

I prefer to use one of the summarization packages for this sort of
thing, but aggregate() works, too. Here are two versions of the same
idea:

# Uses ddply() in the plyr package:
index.date <- function(d) {
     require('plyr')
     out1 <- ddply(d, .(id, rcat), summarise, index = max(tdiff))
     ndate <- as.numeric(as.Date('2002-09-01')) - out1[['index']]
     out1$index.date <- as.Date(ndate, origin = '1970-01-01')
     out1 <- out1[, -3]
     out1
   }

# Uses aggregate() from the base package:
index.date2 <- function(d) {
     out <- aggregate(tdiff ~ id + rcat, data = d, FUN = max)
    ndate <- as.numeric(as.Date('2002-09-01')) - out[['tdiff']]
     out$index.date <- as.Date(ndate, origin = '1970-01-01')
     out <- out[, -3]
     out
   }

index.date(test)
index.date2(test)

In each function, I did the following:

(1) Found the maximum time difference from the reference date 2002-09-01.
(2) Determined the numeric value of the date associated with the max
time difference (ndate)
(3) Determined the date associated with the maximum time difference
and assigned it the variable name index.date in the output data frame.
(4) Removed the variable computed in (1) from the output data frame.
(5) Return the output data frame and exit.

HTH,
Dennis

On Wed, Jul 27, 2011 at 6:38 AM, jose Bartolomei <surfprjab at hotmail.com> wrote:
>
>
>
>
>
>
>
>
>
> Dear
> R users,
>
>
>
> I
> created a matrix that tells me the first day of use of a category by
> id.
>
>
>
> #Calculate
> time difference
> test$tdiff<-as.numeric(difftime(as.Date("2002-09-01"), test$ftime, units = "days"))
>
>
>
> #
> obtain the index date per person and dcategory
> index.date.test<-tapply(test$tdiff,
> list(test$id, test$rcat), max)
>
>
>
> Nonetheless,
> at the moment I think will be more useful to create a column in my
> data that tells me which row is the index date.
>
>
>
>
> Something
> like:
>
>
>
> ti<-function(x){
>        ifelse(x==max(x),
> "i", "n") # x = test$tdiff
> }
>
>
>
> tapply(test$tdiff,
> list(test$rcat, test$id), FUN=ti)
>
>
>
> I
> have been testing different things for few days but I am in a loop
> and I do not see my mistake.
>
>
>
> It
> should be simple but I can't get it
>
>
>
> Bellow
> a test data
>
>
>
> Thanks in advance for your time,
> Jose
> Back ground info:  I want to use the index.date to obtain information from other df for every id six month previous the index date.
> Then I should normalize the ftime to a common time frame and look form patterns in that time frame.
> (Do not know yet how I will do it. )
>
>
>
>
>
>
> ###
> test data ####
>
>
>
> structure(list(id = c(1L, 1L, 1L, 46L, 80L, 80L, 80L, 80L, 88L,
> 160L, 179L, 179L, 179L, 179L, 179L, 179L, 192L, 192L, 192L, 204L,
> 204L, 204L, 204L, 205L, 211L, 233L, 233L, 272L, 272L, 272L, 272L,
> 309L, 309L, 309L, 310L, 310L, 314L, 314L, 315L, 316L, 320L, 320L,
> 320L, 320L, 324L, 324L, 324L, 329L, 329L, 339L, 354L, 354L, 354L,
> 357L, 358L, 359L, 364L, 366L, 377L, 377L, 377L, 377L, 377L, 377L,
> 377L, 377L, 377L, 377L, 377L, 377L, 379L, 383L, 383L, 387L, 387L,
> 391L, 395L, 398L, 401L, 401L, 401L, 401L, 401L, 407L, 407L, 407L,
> 409L, 414L, 414L, 414L, 434L, 434L, 434L, 437L, 437L, 437L, 437L,
> 437L, 439L, 439L, 439L, 439L, 442L, 443L, 450L, 452L, 452L, 459L,
> 459L, 468L, 472L, 472L, 472L, 478L, 478L, 484L, 484L, 484L, 484L,
> 484L, 486L, 486L, 486L, 487L, 487L, 487L, 487L, 487L), ftime = structure(c(11761,
> 11824, 11925, 11852, 11814, 11814, 11929, 11929, 11902, 11857,
> 11779, 11779, 11807, 11841, 11871, 11899, 11831, 11894, 11925,
> 11761, 11801, 11843, 11905, 11832, 11877, 11838, 11901, 11783,
> 11783, 11818, 11850, 11750, 11782, 11905, 11852, 11877, 11852,
> 11922, 11855, 11838, 11845, 11878, 11901, 11927, 11795, 11817,
> 11837, 11901, 11928, 11853, 11751, 11751, 11877, 11922, 11760,
> 11914, 11857, 11912, 11752, 11752, 11785, 11785, 11825, 11825,
> 11862, 11862, 11891, 11891, 11926, 11926, 11919, 11907, 11907,
> 11842, 11873, 11842, 11922, 11865, 11782, 11829, 11858, 11888,
> 11912, 11750, 11803, 11897, 11871, 11787, 11787, 11787, 11764,
> 11817, 11882, 11778, 11808, 11863, 11894, 11918, 11771, 11817,
> 11851, 11907, 11799, 11766, 11794, 11765, 11828, 11788, 11884,
> 11897, 11810, 11852, 11922, 11810, 11846, 11801, 11835, 11859,
> 11891, 11922, 11771, 11884, 11925, 11765, 11765, 11801, 11843,
> 11892), class = "Date"), rcat = structure(c(1L, 1L, 1L, 1L, 1L,
> 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
> 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("ICS",
> "LABA", "MCSs"), class = "factor"), tdiff = c(170, 107, 6, 79,
> 117, 117, 2, 2, 29, 74, 152, 152, 124, 90, 60, 32, 100, 37, 6,
> 170, 130, 88, 26, 99, 54, 93, 30, 148, 148, 113, 81, 181, 149,
> 26, 79, 54, 79, 9, 76, 93, 86, 53, 30, 4, 136, 114, 94, 30, 3,
> 78, 180, 180, 54, 9, 171, 17, 74, 19, 179, 179, 146, 146, 106,
> 106, 69, 69, 40, 40, 5, 5, 12, 24, 24, 89, 58, 89, 9, 66, 149,
> 102, 73, 43, 19, 181, 128, 34, 60, 144, 144, 144, 167, 114, 49,
> 153, 123, 68, 37, 13, 160, 114, 80, 24, 132, 165, 137, 166, 103,
> 143, 47, 34, 121, 79, 9, 121, 85, 130, 96, 72, 40, 9, 160, 47,
> 6, 166, 166, 130, 88, 39)), .Names = c("id", "ftime", "rcat",
> "tdiff"), row.names = c(11L, 4L, 13L, 25L, 39L, 41L, 35L, 44L,
> 54L, 57L, 96L, 98L, 88L, 107L, 80L, 77L, 118L, 136L, 124L, 146L,
> 150L, 157L, 153L, 169L, 196L, 210L, 214L, 225L, 230L, 221L, 222L,
> 258L, 266L, 281L, 311L, 324L, 333L, 334L, 358L, 372L, 400L, 419L,
> 423L, 434L, 439L, 437L, 443L, 479L, 465L, 496L, 517L, 516L, 519L,
> 525L, 539L, 598L, 606L, 634L, 658L, 649L, 637L, 655L, 640L, 644L,
> 645L, 636L, 647L, 646L, 639L, 654L, 665L, 673L, 680L, 701L, 688L,
> 712L, 737L, 738L, 784L, 766L, 785L, 753L, 773L, 799L, 791L, 808L,
> 818L, 826L, 821L, 820L, 838L, 830L, 837L, 841L, 840L, 844L, 850L,
> 845L, 886L, 875L, 887L, 868L, 899L, 912L, 915L, 931L, 929L, 934L,
> 939L, 957L, 988L, 975L, 981L, 1015L, 1003L, 1043L, 1051L, 1056L,
> 1034L, 1031L, 1073L, 1068L, 1065L, 1079L, 1101L, 1092L, 1089L,
> 1096L), class = "data.frame")
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list