[R] Condirional row removing and replacing in small data.table

Frank S. f_j_rod at hotmail.com
Tue Nov 15 19:44:04 CET 2016


Dear R list members,


I have a data table of which here is an example:

dt <- data.table(id = rep(1:3, c(5, 1, 2)),
     date = as.Date(rep(c("2005-07-25", "2006-09-17",  "1998-11-06", "2001-04-19"), c(3, 2, 1, 2))),
     fam = factor(c(1, 1, 3, 1, 1, 5, 4, 2)),

     code = factor(c(90, 91, 300, 75, 91, 500, 400, 90)))


I would want to conduct 3 operations:

A) Remove rows whose fam is not {1, 2 or 3}, except where this would lead to the disappearance

     of subject (case of id = 2), where we will keep the row but assigning fam=0 and code=0.

B) If within same id and date there are 2 rows with code=90 and code=91   (regardless the order

     of appearance), then remove that with code=91.

C) If  within same id and date there is only 1 row with code=91,  then this row will be kept but

     changing its value to code=90.


The right solution would be:

id             date  fam code
 1  25/07/2005    1    90
 1  25/07/2005    3  300
 1  17/09/2006    1    75
 1  17/09/2006    1    90
 2  06/11/1998    0      0
 3  19/04/2001    2    90


I have tried to implement step A, but I get an error message when executing. Moreover, I'm aware
that the code I present may be not the optimal way to do so (since I need too many code lines):


dtcount <- dt[, count1 := .N, by = id][, count2 := .N, by = list(id, date)] # add two counts

dtA <- dtcount[, {
  if (!(fam %in% 1:3) && count1 == 1) {
    result <- list(date = date, fam = factor(0), code = factor(0))
    } else {
  if (fam %in% 1:3) {
    result <- list(date = date, fam = fam, code = code)
    }
  }
  result
}, by = id]


Any help would be appreciated!


Frank S.


	[[alternative HTML version deleted]]



More information about the R-help mailing list