[R] how to subset based on other row values and multiplicity

Wed Jul 16 16:09:02 CEST 2014

On Wed, Jul 16, 2014 at 8:51 AM, jim holtman <jholtman at gmail.com> wrote:
> I can reproduce what you requested, but there was the question about
> what happens with the multiple 'c-y' values.
>
> ====================
>
>> require(data.table)
>> x <- read.table(text = 'id   date value
> + a    2000-01-01 x
> + a    2000-03-01 x
> + b    2000-11-11 w
> + c    2000-11-11 y
> + c    2000-10-01 y
> + c    2000-09-10 y
> + c    2000-12-12 z
> + c    2000-10-11 z
> + d    2000-11-11 w
> + d    2000-11-10 w', as.is = TRUE, header = TRUE)
>> setDT(x)
>> x[, date := as.Date(date)]
>> setkey(x, id, value, date)
>>
>> y <- x[
> +     , {
> +         if (.N == 1) val <- NULL  # only one -- delete
> +         else {
> +             dif <- difftime(tail(date, -1), head(date, -1), units = 'days')
> +             # return first value if any > 31
> +             if (any(dif >= 31)) val <- list(date = date[1L])
> +             else val <- NULL
> +         }
> +         val
> +       }
> +     , keyby = 'id,value'
> +     ]
>> y
>    id value       date
> 1:  a     x 2000-01-01
> 2:  c     y 2000-09-10
> 3:  c     z 2000-10-11
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>

Wow, I picked up a couple of _nice_ techniques from that one post!
Looks like "data.table" will let me do SQL like things in R. I have a
warped brain. I think in "result sets" and "matrix operations"

Many thanks.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown