[R] Using lapply in R data table
Bert Gunter
bgunter.4567 at gmail.com
Mon Sep 26 19:59:16 CEST 2016
This seems like a job for cut() .
(I made DT a data frame to avoid loading the data table package. But I
assume it would work with a data table too, Check this, though!)
> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
> DT
id fini group exposure
1 2 2005-04-20 A 1
2 2 2005-04-20 A 1
3 2 2005-04-20 A 1
4 5 2006-02-19 B 0.87
5 5 2006-02-19 B 0.87
6 7 2006-10-08 A 0.5
7 7 2006-10-08 A 0.5
(but note that exposure is a factor, not numeric)
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote:
> Hi Frank,
>
> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>
> There are probably better ways, but here is one approach.
>
> DT[, exposure := vector(mode = "numeric", length = .N)]
> DT[fini < as.Date("2006-01-01"), exposure := 1]
> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>
> Best,
> Ista
>
> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote:
>> Dear all,
>>
>> I have a R data table like this:
>>
>> DT <- data.table(
>> id = rep(c(2, 5, 7), c(3, 2, 2)),
>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>> group = rep(c("A", "B", "A"), c(3, 2, 2)) )
>>
>>
>> I want to construct a new variable "exposure" defined as follows:
>>
>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>
>>
>> So the desired output would be the following data table:
>>
>> id fini exposure group
>> 1: 2 2005-04-20 1.00 A
>> 2: 2 2005-04-20 1.00 A
>> 3: 2 2005-04-20 1.00 A
>> 4: 5 2006-02-19 0.87 B
>> 5: 5 2006-02-19 0.87 B
>> 6: 7 2006-10-08 0.50 A
>> 7: 7 2006-10-08 0.50 A
>>
>>
>> I have tried:
>>
>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>> DT.new <- lapply(DT, function(exposure){
>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case
>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case
>> exposure # return value
>> })
>>
>>
>> But I get an error message.
>>
>> Thanks for any help!!
>>
>>
>> Frank S.
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list