[R] Condirional row removing and replacing in small data.table
Frank S.
f_j_rod at hotmail.com
Tue Nov 15 19:44:04 CET 2016
Dear R list members,
I have a data table of which here is an example:
dt <- data.table(id = rep(1:3, c(5, 1, 2)),
date = as.Date(rep(c("2005-07-25", "2006-09-17", "1998-11-06", "2001-04-19"), c(3, 2, 1, 2))),
fam = factor(c(1, 1, 3, 1, 1, 5, 4, 2)),
code = factor(c(90, 91, 300, 75, 91, 500, 400, 90)))
I would want to conduct 3 operations:
A) Remove rows whose fam is not {1, 2 or 3}, except where this would lead to the disappearance
of subject (case of id = 2), where we will keep the row but assigning fam=0 and code=0.
B) If within same id and date there are 2 rows with code=90 and code=91 (regardless the order
of appearance), then remove that with code=91.
C) If within same id and date there is only 1 row with code=91, then this row will be kept but
changing its value to code=90.
The right solution would be:
id date fam code
1 25/07/2005 1 90
1 25/07/2005 3 300
1 17/09/2006 1 75
1 17/09/2006 1 90
2 06/11/1998 0 0
3 19/04/2001 2 90
I have tried to implement step A, but I get an error message when executing. Moreover, I'm aware
that the code I present may be not the optimal way to do so (since I need too many code lines):
dtcount <- dt[, count1 := .N, by = id][, count2 := .N, by = list(id, date)] # add two counts
dtA <- dtcount[, {
if (!(fam %in% 1:3) && count1 == 1) {
result <- list(date = date, fam = factor(0), code = factor(0))
} else {
if (fam %in% 1:3) {
result <- list(date = date, fam = fam, code = code)
}
}
result
}, by = id]
Any help would be appreciated!
Frank S.
[[alternative HTML version deleted]]
More information about the R-help
mailing list