[R] subset only if f.e a column is successive for more than 3 values

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Fri Sep 28 00:35:37 CEST 2018


Hi Knut,
As Bert said, you can start with diff and work from there. I can
easily get the text for the subset, but despite fooling around with
"parse", "eval" and "expression", I couldn't get it to work:

# use a bigger subset to test whether multiple runs can be extracted
kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp"))
kkdf$index<-as.numeric(rownames(kkdf))
# get the run length encoding
seqindx<-rle(diff(kkdf$index)==1)
# get a logical vector of the starts of the runs
runsel<-seqindx$lengths >= 3 & seqindx$values
# get the indices for the starts of the runs
starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
# and the ends
ends<-cumsum(seqindx$lengths)[runsel]+1
# the character representation of the subset as indices is
paste0("c(",paste(starts,ends,sep=":",collapse=","),")")

I expect there will be a lightning response from someone who knows
about converting the resulting string into whatever is needed.

Jim
On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> 1. I assume the values are integers, not floats/numerics (which woud make
> it more complicated).
>
> 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a
> row.
>
> I don't have time to work out details, but perhaps that helps.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp using krueger-family.de>
> wrote:
>
> > Hi to all
> >
> > I need a subset for values if there are f.e 3 values successive in a
> > column of a Data Frame:
> > Example from the subset help page:
> >
> > subset(airquality, Temp > 80, select = c(Ozone, Temp))
> > 29     45   81
> > 35     NA   84
> > 36     NA   85
> > 38     29   82
> > 39     NA   87
> > 40     71   90
> > 41     39   87
> > 42     NA   93
> > 43     NA   92
> > 44     23   82
> > .....
> >
> > I would like to get only
> >
> > ...
> > 40     71   90
> > 41     39   87
> > 42     NA   93
> > 43     NA   92
> > 44     23   82
> > ....
> >
> > because the left column is ascending more than f.e three times without gap
> >
> > Any hints for a package or do I need to build a own function?
> >
> > Kind Regards Knut
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list