[R] flag records

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Apr 27 07:53:04 CEST 2022


Hello,

Maybe something like the following will do it.
In the ave function, don't forget that diff returns a vector of a 
different length, one less element. So combine with an initial zero.
Then 1 + FALSE/TRUE equals 1/2 and subset the target vector c("Y", "X") 
with these indices.


i_ddiff <- with(DF3, ave(as.numeric(ddate), State, name, day, FUN = \(x) 
c(0L, diff(x))) < 50)
DF3$ddiff <- c("Y", "X")[1L + i_ddiff]


An alternative is to assign a default "Y" to the new column and then 
assign "X" where the condition is TRUE. This is easier to read.


DF3$ddiff <- "Y"
DF3$ddiff[i_ddiff] <- "X"


Hope this helps,

Rui Barradas

Às 23:17 de 26/04/2022, Val escreveu:
> Hi All,
> 
> I want to flag a record based on the following condition.
> The variables  in the sample data are
> State, name, day, text, ddate
> 
> Sort the data by State, name, day ddate,
> 
> Within  State, name, day
>      assign consecutive number for each row
>      find the date difference between consecutive rows,
>      if the difference is less than 50 days and the text string in
> previous and current rows  are the same then flag the record as X,
> otherwise Y.
> 
> Here is  sample data and my attempt,
> 
> DF<-read.table(text="State name day text ddate
>    CA A 1 xch 2014/09/16
>    CA A 2 xck 2015/5/29
>    CA A 2 xck 2015/6/18
>    CA A 2 xcm 2015/8/3
>    CA A 2 xcj 2015/8/26
>    FL B 3 xcu  2017/7/23
>    FL B 3 xcl  2017/7/03
>    FL B 3 xmc  2017/7/26
>    FL B 3 xca  2017/3/17
>    FL B 3 xcb  2017/4/8
>    FL B 4 xhh  2017/3/17
>    FL B 4 xhh  2017/1/29",header=TRUE)
> 
>    DF$ddate   <- as.Date (as.Date(DF$ddate),  format="%Y/%m/%d" )
>    DF3         <- DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ]
>    DF3$C       <- with(DF3, ave(State, name, day, FUN = seq_along))
>    DF3$diff    <- with(DF3, ave(as.integer(ddate), State, name, day,
> FUN = function(x) x - x[1]))
> 
> I stopped here, how do I evaluate the previous and the current rows
> text string and date difference?
> 
> Desired result,
> 
> 
>       State name day text      ddate C diff flag
> 1     CA    A   1  xch 2014-09-16 1    0     y
> 2     CA    A   2  xck 2015-05-29 1    0      y
> 3     CA    A   2  xck 2015-06-18 2   20     x
> 4     CA    A   2  xcm 2015-08-03 3   66    y
> 5     CA    A   2  xcj 2015-08-26 4   89      y
> 9     FL    B   3  xca 2017-03-17 1    0      y
> 10    FL    B   3  xcb 2017-04-08 2   22    y
> 7     FL    B   3  xcl 2017-07-03 3   108     y
> 6     FL    B   3  xcu 2017-07-23 4  128    y
> 8     FL    B   3  xmc 2017-07-26 5  131   y
> 12    FL    B   4  xhh 2017-01-29 1    0     y
> 11    FL    B   4  xhh 2017-03-17 2   47    x
> 
> 
> 
> Thank you,
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list