[R] arithmetic problem
William Dunlap
wdunlap at tibco.com
Sat May 30 18:49:49 CEST 2009
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Gabor Grothendieck
> Sent: Saturday, May 30, 2009 9:11 AM
> To: Iain Gallagher
> Cc: r-help at r-project.org
> Subject: Re: [R] arithmetic problem
>
> Here are are assuming
>
> 1. for each row that if that row's value is within 200 - 300 of the
> prior or next value with the same ind then that row should be
> extracted.
> 2. the input is sorted by values within ind
> If that's not the intention then modify the code accordingly.
>
> First we read in the data into data frame DF.
>
> Then we define between(x, min, max) which is a function that returns a
> vector whose
> ith component is TRUE if x[i] is between min and max.
>
> Then use ave() to get a selection vector. In this case ave
> returns a vector of
> zeros and ones and we convert that to the logical vector sel which
> defines the selection.
>
> # read the data
> Lines <- "values ind
> 1 2655 7A5
> 2 3028 7A5
> 3 689 ABBA-1
> 4 1336 ABBA-1
> 5 1560 ABBA-1
> 6 2820 ABLIM1
> 7 3339 ABLIM1
> 8 171 ACSM5
> 9 195 ACSM5
> 10 43 ADAMDEC1
> 11 129 ADAMDEC1
> 12 1105 AFF1
> 13 3202 AFF1
> 14 852 AFF3
> 15 2461 AFF3
> 16 45 AKT1
> 17 397 AKT1
> 18 1430 AQP2
> 19 2402 AQP2
> 20 2551 ARHGAP19"
> DF <- read.table(textConnection(Lines), header = TRUE)
>
> between <- function(x, min, max) x > min & max > x
>
> sel <- ave(DF$values, DF$ind, FUN = function(v)
> between(c(FALSE, diff(v)), 200, 300) |
> between(c(diff(v), FALSE), 200, 300)
> ) > 0
>
> DF[sel, ]
Since DF is sorted appropriately we could speed that up by avoiding
the repeated function calls done by ave() by or-ing in to your
between() clauses the clause
ind[-1]==ind[-length(ind)]
as in
sel1 <- with(DF, c( {dv<-values[-1]-values[-length(values)];dv>200&dv<300} & ind[-1]==ind[-length(ind)], FALSE))
(This one just gives the lower of each pair.)
Someone recently proposed making a function like diff in which you
could insert the operator of your choice, like "==" here, instead of
the usual "-". That might make code like this easier to understand.
>
>
> On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher
> <iaingallagher at btopenworld.com> wrote:
> >
> > Hello list
> >
> > I have a problem with a dataset (see toy example below)
> where I am trying to find the difference between two (or more
> numbers) and discard those observations which fall outside a
> set interval.
> >
> > An example and further explanation:
> >
> > values ind
> > 1 2655 7A5
> > 2 3028 7A5
> > 3 689 ABBA-1
> > 4 1336 ABBA-1
> > 5 1560 ABBA-1
> > 6 2820 ABLIM1
> > 7 3339 ABLIM1
> > 8 171 ACSM5
> > 9 195 ACSM5
> > 10 43 ADAMDEC1
> > 11 129 ADAMDEC1
> > 12 1105 AFF1
> > 13 3202 AFF1
> > 14 852 AFF3
> > 15 2461 AFF3
> > 16 45 AKT1
> > 17 397 AKT1
> > 18 1430 AQP2
> > 19 2402 AQP2
> > 20 2551 ARHGAP19
> >
> > Each number in the values column above is associated with a
> label (in the ind column). For some inds there will be only 2
> values but as can be seen from the data other inds have many values.
> >
> > Here's what I want to do using the ABBA-1 data from above
> as an example:
> >
> > calculate the differences between each value:
> >
> > 1560-1336 = 224
> > 1336-689 = 647
> >
> > then use these values to create an index that will allow me
> to pull out values between set limits. If I set the limits to
> between 200 and 300 then the index will reference rows 4 & 5
> in the above data set.
> >
> > I hope this is reasonably clear and I appreciate any suggestions.
> >
> > Thanks
> >
> > Iain
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list