[R] Keep only first date from consecutive dates

Sat Dec 5 01:34:38 CET 2015

> On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap at tibco.com> wrote:
> 
> With a data.frame sorted by id, with ties broken by date, as in
> your example, you can select rows that are either the start
> of a new id group or the start of run of consecutive dates with:
> 
>> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
>> which(w)
> [1] 1 4 5 7
>> uci[w,]
>  id       date value
> 1  1 2005-10-28     1
> 4  1 2005-11-07     3
> 5  1 2007-03-19     1
> 7  2 2004-06-02     2
> 
> I'll leave it to you to translate that R syntax into data.table syntax -
> it just involves comparing the current row with the previous row.
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com> wrote:
>> Dear R users,
>> 
>> I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame.
>> Working with grouped data (by "id"),  I wonder if it is possible to keep in a R data.frame (or R data.table):
>> a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates.
>> b) All the rows which do not belong to the above groups.
>> 
>> As an example, I have "uci" data.frame:
>> 
>> uci <- data.table(id=c(rep(1,6),2),
>>                date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
>>                value = c(1, 2, 1, 3, 1, 2, 2))
>> 
>>   id              date   value
>>    1  2005-10-28        1
>>    1  2005-10-29        2
>>    1  2005-10-30        1
>>    1  2005-11-07        3
>>    1  2007-03-19        1
>>    1  2007-03-20        2
>>    2  2004-06-02        2
>> 
>> And the desired output would be:
>> 
>>   id              date   value
>>    1  2005-10-28        1
>>    1  2005-11-07        3
>>    1  2007-03-19        1
>>    2  2004-06-02        2

The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I never trust my intuition, though.

Selection is usually done with a logical vector in the ‘i’-position. The diff operator does succeed in the ‘i’ position with the obvious need to prepend with a starting value..

> uci[ c(0,diff(date))!=1, ]
   id       date value
1:  1 2005-10-28     1
2:  1 2005-11-07     3
3:  1 2007-03-19     1
4:  2 2004-06-02     2

The other cases are handle with the converse-expression

> uci[c(0,diff(date)) == 1, ]
   id       date value
1:  1 2005-10-29     2
2:  1 2005-10-30     1
3:  1 2007-03-20     2

>> 
>> # From the following link, I have tried:
>> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
>> 
>> setDT(uci)[ ,list(date=date[1L], value = value[1L]),  by = .(ind=rleid(date), id)][, ind:=NULL][]
>> 
>> But I get the same data frame, and I do not know the reason.
>> 
>> Thank you very much for any help!!
>> 
>> Frank S.
>> 
>> 
>> 
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA