[R] arithmetic problem

William Dunlap wdunlap at tibco.com
Sat May 30 18:49:49 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Gabor Grothendieck
> Sent: Saturday, May 30, 2009 9:11 AM
> To: Iain Gallagher
> Cc: r-help at r-project.org
> Subject: Re: [R] arithmetic problem
> 
> Here are are assuming
> 
> 1. for each row that if that row's value is within 200 - 300 of the
> prior or next value with the same ind then that row should be 
> extracted.
> 2. the input is sorted by values within ind
>  If that's not the intention then modify the code accordingly.
> 
> First we read in the data into data frame DF.
> 
> Then we define between(x, min, max) which is a function that returns a
> vector whose
> ith component is TRUE if x[i] is between min and max.
> 
> Then use ave() to get a selection vector.  In this case ave 
> returns a vector of
> zeros and ones and we convert that to the logical vector sel which
> defines the selection.
> 
> # read the data
> Lines <- "values      ind
> 1    2655      7A5
> 2    3028      7A5
> 3     689   ABBA-1
> 4    1336   ABBA-1
> 5    1560   ABBA-1
> 6    2820   ABLIM1
> 7    3339   ABLIM1
> 8     171    ACSM5
> 9     195    ACSM5
> 10     43 ADAMDEC1
> 11    129 ADAMDEC1
> 12   1105     AFF1
> 13   3202     AFF1
> 14    852     AFF3
> 15   2461     AFF3
> 16     45     AKT1
> 17    397     AKT1
> 18   1430     AQP2
> 19   2402     AQP2
> 20   2551 ARHGAP19"
> DF <- read.table(textConnection(Lines), header = TRUE)
> 
> between <- function(x, min, max) x > min & max > x
> 
> sel <- ave(DF$values, DF$ind, FUN = function(v)
> 	between(c(FALSE, diff(v)), 200, 300) | 
> between(c(diff(v), FALSE), 200, 300)
> ) > 0
> 
> DF[sel, ]

Since DF is sorted appropriately we could speed that up by avoiding
the repeated function calls done by ave() by or-ing in to your
between() clauses the clause
    ind[-1]==ind[-length(ind)]
as in
    sel1 <- with(DF, c( {dv<-values[-1]-values[-length(values)];dv>200&dv<300} & ind[-1]==ind[-length(ind)], FALSE))
(This one just gives the lower of each pair.)

Someone recently proposed making a function like diff in which you
could insert the operator of your choice, like "==" here, instead of
the usual "-".  That might make code like this easier to understand.


> 
> 
> On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher
> <iaingallagher at btopenworld.com> wrote:
> >
> > Hello list
> >
> > I have a problem with a dataset (see toy example below) 
> where I am trying to find the difference between two (or more 
> numbers) and discard those observations which fall outside a 
> set interval.
> >
> > An example and further explanation:
> >
> >   values      ind
> > 1    2655      7A5
> > 2    3028      7A5
> > 3     689   ABBA-1
> > 4    1336   ABBA-1
> > 5    1560   ABBA-1
> > 6    2820   ABLIM1
> > 7    3339   ABLIM1
> > 8     171    ACSM5
> > 9     195    ACSM5
> > 10     43 ADAMDEC1
> > 11    129 ADAMDEC1
> > 12   1105     AFF1
> > 13   3202     AFF1
> > 14    852     AFF3
> > 15   2461     AFF3
> > 16     45     AKT1
> > 17    397     AKT1
> > 18   1430     AQP2
> > 19   2402     AQP2
> > 20   2551 ARHGAP19
> >
> > Each number in the values column above is associated with a 
> label (in the ind column). For some inds there will be only 2 
> values but as can be seen from the data other inds have many values.
> >
> > Here's what I want to do using the ABBA-1 data from above 
> as an example:
> >
> > calculate the differences between each value:
> >
> > 1560-1336 = 224
> > 1336-689 = 647
> >
> > then use these values to create an index that will allow me 
> to pull out values between set limits. If I set the limits to 
> between 200 and 300 then the index will reference rows 4 & 5 
> in the above data set.
> >
> > I hope this is reasonably clear and I appreciate any suggestions.
> >
> > Thanks
> >
> > Iain
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list