[R] Using lapply in R data table
Ista Zahn
istazahn at gmail.com
Mon Sep 26 21:07:11 CEST 2016
On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> I thought that that was a typo from the OP, as it disagrees with his
> example. But the labels are arbitrary, so in fact cut() will do it
> whichever way he meant.
I don't see how cut will do it, at least not conveniently. Consider
this slightly altered example:
library(data.table)
DT <- data.table(
id = rep(c(2, 5, 7), c(3, 2, 2)),
fini = rep(as.Date(c('2005-04-20',
'2006-02-19',
'2006-06-29',
'2006-10-08')),
c(3, 1, 1, 2)),
group = rep(c("A", "B", "A"), c(3, 2, 2)) )
DT[, exposure := vector(mode = "numeric", length = .N)]
DT[fini < as.Date("2006-01-01"), exposure := 1]
DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
DT
## id fini group exposure
## 1: 2 2005-04-20 A 1.0000000
## 2: 2 2005-04-20 A 1.0000000
## 3: 2 2005-04-20 A 1.0000000
## 4: 5 2006-02-19 B 0.8651608
## 5: 5 2006-06-29 B 0.5092402
## 6: 7 2006-10-08 A 0.5000000
## 7: 7 2006-10-08 A 0.5000000
Best,
Ista
>
> -- Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istazahn at gmail.com> wrote:
>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>> This seems like a job for cut() .
>>
>> I thought that at first two, but the middle group shouldn't be .87 but rather
>>
>> exposure" = "2007-01-01" - "fini"
>>
>> so, I think cut alone won't do it.
>>
>> Best,
>> Ista
>>>
>>> (I made DT a data frame to avoid loading the data table package. But I
>>> assume it would work with a data table too, Check this, though!)
>>>
>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>>
>>>> DT
>>> id fini group exposure
>>> 1 2 2005-04-20 A 1
>>> 2 2 2005-04-20 A 1
>>> 3 2 2005-04-20 A 1
>>> 4 5 2006-02-19 B 0.87
>>> 5 5 2006-02-19 B 0.87
>>> 6 7 2006-10-08 A 0.5
>>> 7 7 2006-10-08 A 0.5
>>>
>>>
>>> (but note that exposure is a factor, not numeric)
>>>
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote:
>>>> Hi Frank,
>>>>
>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>>
>>>> There are probably better ways, but here is one approach.
>>>>
>>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>>
>>>> Best,
>>>> Ista
>>>>
>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote:
>>>>> Dear all,
>>>>>
>>>>> I have a R data table like this:
>>>>>
>>>>> DT <- data.table(
>>>>> id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>> group = rep(c("A", "B", "A"), c(3, 2, 2)) )
>>>>>
>>>>>
>>>>> I want to construct a new variable "exposure" defined as follows:
>>>>>
>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>>
>>>>>
>>>>> So the desired output would be the following data table:
>>>>>
>>>>> id fini exposure group
>>>>> 1: 2 2005-04-20 1.00 A
>>>>> 2: 2 2005-04-20 1.00 A
>>>>> 3: 2 2005-04-20 1.00 A
>>>>> 4: 5 2006-02-19 0.87 B
>>>>> 5: 5 2006-02-19 0.87 B
>>>>> 6: 7 2006-10-08 0.50 A
>>>>> 7: 7 2006-10-08 0.50 A
>>>>>
>>>>>
>>>>> I have tried:
>>>>>
>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>>> DT.new <- lapply(DT, function(exposure){
>>>>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case
>>>>> exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>> exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5 # 3rd case
>>>>> exposure # return value
>>>>> })
>>>>>
>>>>>
>>>>> But I get an error message.
>>>>>
>>>>> Thanks for any help!!
>>>>>
>>>>>
>>>>> Frank S.
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list