[R] flag records

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Wed Apr 27 20:44:31 CEST 2022


... and of course it should be:
flag <- by(DF3, fac, function(x)yourfun(x$text,x$day))

('foo' was my earlier name when I was foo-ling around with this)

Bert



On Wed, Apr 27, 2022 at 11:12 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> ... and also, the with() is unnecessary:
> flag <- by(DF3, fac, function(x)foo(x$text,x$day))
> ## will do.
>
> Bert
>
> On Wed, Apr 27, 2022 at 11:06 AM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
>
>> OK. I may completely misunderstand. If you are happy with what Rui and/or
>> others have given you, **read no further**, as it will just be noise.
>>
>> Otherwise, I don't think tapply()/ave() etc. will do quite what you want
>> splitting by the list of separate factors -- or at least not easily. I
>> think it's simplest just to start by creating a single factor with just the
>> combinations of levels you have. It turns out here that because you are
>> ordering lexicographically -- which is what the factor() function will do
>> by default -- this will make it easy to get back your results in exactly
>> the form you want. It could be a bit of a hassle if this were not the case.
>>
>> So first, starting from your already sorted DF3
>> > fac <- with(DF3, factor(paste(State, name, day, sep = '.')))
>>
>> ## which gives:
>> > fac
>>  [1] CA.A.1 CA.A.2 CA.A.2 CA.A.2 CA.A.2 FL.B.3 FL.B.3 FL.B.3 FL.B.3 FL.B.3
>> [11] FL.B.4 FL.B.4
>> Levels: CA.A.1 CA.A.2 FL.B.3 FL.B.4  ## ordering the same as in DF3
>>
>> Next I wrote a little function that I think applies the logic you
>> specified, yielding TRUE for when you want to raise the flag and FALSE if
>> not:
>>
>> yourfun <- function(text, day){
>>    len <- length(text)
>>    if(len == 1) FALSE ## only 1 record in the group
>>    else c(FALSE, (diff(day) < 50 & text[-1] == text[-len]))
>>    ## first record gets FALSE as there are none previous
>> }
>>
>> Then I use the by() function to apply this groupwise as you specified:
>>
>> > flag <-with(DF3,
>> +       by(DF3, fac, function(x)foo(x$text,x$day))
>> + )
>> ## Here's what you get
>> > flag
>> fac: CA.A.1
>> [1] FALSE
>> -------------------------------------------------------
>> fac: CA.A.2
>> [1] FALSE  TRUE FALSE FALSE
>> -------------------------------------------------------
>> fac: FL.B.3
>> [1] FALSE FALSE FALSE FALSE FALSE
>> -------------------------------------------------------
>> fac: FL.B.4
>> [1] FALSE  TRUE
>>
>> This is class "by", essentially a list. So one can use do.call() and an
>> implicit cast to numeric to get the x,y  flags that you specified:
>>
>> > flag <- c("y", "x")[do.call(c, flag) + 1]
>>
>> ## yielding
>> > flag
>>  [1] "y" "y" "x" "y" "y" "y" "y" "y" "y" "y" "y" "x"
>>
>> (you could also use the within() function to do this within DF3 and
>> return the modified DF)
>>
>> HTH,
>>
>> Bert
>>
>>
>>
>>
>> On Tue, Apr 26, 2022 at 10:53 PM Rui Barradas <ruipbarradas using sapo.pt>
>> wrote:
>>
>>> Hello,
>>>
>>> Maybe something like the following will do it.
>>> In the ave function, don't forget that diff returns a vector of a
>>> different length, one less element. So combine with an initial zero.
>>> Then 1 + FALSE/TRUE equals 1/2 and subset the target vector c("Y", "X")
>>> with these indices.
>>>
>>>
>>> i_ddiff <- with(DF3, ave(as.numeric(ddate), State, name, day, FUN = \(x)
>>> c(0L, diff(x))) < 50)
>>> DF3$ddiff <- c("Y", "X")[1L + i_ddiff]
>>>
>>>
>>> An alternative is to assign a default "Y" to the new column and then
>>> assign "X" where the condition is TRUE. This is easier to read.
>>>
>>>
>>> DF3$ddiff <- "Y"
>>> DF3$ddiff[i_ddiff] <- "X"
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Às 23:17 de 26/04/2022, Val escreveu:
>>> > Hi All,
>>> >
>>> > I want to flag a record based on the following condition.
>>> > The variables  in the sample data are
>>> > State, name, day, text, ddate
>>> >
>>> > Sort the data by State, name, day ddate,
>>> >
>>> > Within  State, name, day
>>> >      assign consecutive number for each row
>>> >      find the date difference between consecutive rows,
>>> >      if the difference is less than 50 days and the text string in
>>> > previous and current rows  are the same then flag the record as X,
>>> > otherwise Y.
>>> >
>>> > Here is  sample data and my attempt,
>>> >
>>> > DF<-read.table(text="State name day text ddate
>>> >    CA A 1 xch 2014/09/16
>>> >    CA A 2 xck 2015/5/29
>>> >    CA A 2 xck 2015/6/18
>>> >    CA A 2 xcm 2015/8/3
>>> >    CA A 2 xcj 2015/8/26
>>> >    FL B 3 xcu  2017/7/23
>>> >    FL B 3 xcl  2017/7/03
>>> >    FL B 3 xmc  2017/7/26
>>> >    FL B 3 xca  2017/3/17
>>> >    FL B 3 xcb  2017/4/8
>>> >    FL B 4 xhh  2017/3/17
>>> >    FL B 4 xhh  2017/1/29",header=TRUE)
>>> >
>>> >    DF$ddate   <- as.Date (as.Date(DF$ddate),  format="%Y/%m/%d" )
>>> >    DF3         <- DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ]
>>> >    DF3$C       <- with(DF3, ave(State, name, day, FUN = seq_along))
>>> >    DF3$diff    <- with(DF3, ave(as.integer(ddate), State, name, day,
>>> > FUN = function(x) x - x[1]))
>>> >
>>> > I stopped here, how do I evaluate the previous and the current rows
>>> > text string and date difference?
>>> >
>>> > Desired result,
>>> >
>>> >
>>> >       State name day text      ddate C diff flag
>>> > 1     CA    A   1  xch 2014-09-16 1    0     y
>>> > 2     CA    A   2  xck 2015-05-29 1    0      y
>>> > 3     CA    A   2  xck 2015-06-18 2   20     x
>>> > 4     CA    A   2  xcm 2015-08-03 3   66    y
>>> > 5     CA    A   2  xcj 2015-08-26 4   89      y
>>> > 9     FL    B   3  xca 2017-03-17 1    0      y
>>> > 10    FL    B   3  xcb 2017-04-08 2   22    y
>>> > 7     FL    B   3  xcl 2017-07-03 3   108     y
>>> > 6     FL    B   3  xcu 2017-07-23 4  128    y
>>> > 8     FL    B   3  xmc 2017-07-26 5  131   y
>>> > 12    FL    B   4  xhh 2017-01-29 1    0     y
>>> > 11    FL    B   4  xhh 2017-03-17 2   47    x
>>> >
>>> >
>>> >
>>> > Thank you,
>>> >
>>> > ______________________________________________
>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list