[R] Keep only first date from consecutive dates
David Winsemius
dwinsemius at comcast.net
Sat Dec 5 01:34:38 CET 2015
> On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap at tibco.com> wrote:
>
> With a data.frame sorted by id, with ties broken by date, as in
> your example, you can select rows that are either the start
> of a new id group or the start of run of consecutive dates with:
>
>> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
>> which(w)
> [1] 1 4 5 7
>> uci[w,]
> id date value
> 1 1 2005-10-28 1
> 4 1 2005-11-07 3
> 5 1 2007-03-19 1
> 7 2 2004-06-02 2
>
> I'll leave it to you to translate that R syntax into data.table syntax -
> it just involves comparing the current row with the previous row.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com> wrote:
>> Dear R users,
>>
>> I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame.
>> Working with grouped data (by "id"), I wonder if it is possible to keep in a R data.frame (or R data.table):
>> a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates.
>> b) All the rows which do not belong to the above groups.
>>
>> As an example, I have "uci" data.frame:
>>
>> uci <- data.table(id=c(rep(1,6),2),
>> date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
>> value = c(1, 2, 1, 3, 1, 2, 2))
>>
>> id date value
>> 1 2005-10-28 1
>> 1 2005-10-29 2
>> 1 2005-10-30 1
>> 1 2005-11-07 3
>> 1 2007-03-19 1
>> 1 2007-03-20 2
>> 2 2004-06-02 2
>>
>> And the desired output would be:
>>
>> id date value
>> 1 2005-10-28 1
>> 1 2005-11-07 3
>> 1 2007-03-19 1
>> 2 2004-06-02 2
The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I never trust my intuition, though.
Selection is usually done with a logical vector in the ‘i’-position. The diff operator does succeed in the ‘i’ position with the obvious need to prepend with a starting value..
> uci[ c(0,diff(date))!=1, ]
id date value
1: 1 2005-10-28 1
2: 1 2005-11-07 3
3: 1 2007-03-19 1
4: 2 2004-06-02 2
The other cases are handle with the converse-expression
> uci[c(0,diff(date)) == 1, ]
id date value
1: 1 2005-10-29 2
2: 1 2005-10-30 1
3: 1 2007-03-20 2
>>
>> # From the following link, I have tried:
>> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
>>
>> setDT(uci)[ ,list(date=date[1L], value = value[1L]), by = .(ind=rleid(date), id)][, ind:=NULL][]
>>
>> But I get the same data frame, and I do not know the reason.
>>
>> Thank you very much for any help!!
>>
>> Frank S.
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list