[R] arithmetic problem

Gabor Grothendieck ggrothendieck at gmail.com
Sat May 30 18:11:10 CEST 2009


Here are are assuming

1. for each row that if that row's value is within 200 - 300 of the
prior or next value with the same ind then that row should be extracted.
2. the input is sorted by values within ind
 If that's not the intention then modify the code accordingly.

First we read in the data into data frame DF.

Then we define between(x, min, max) which is a function that returns a
vector whose
ith component is TRUE if x[i] is between min and max.

Then use ave() to get a selection vector.  In this case ave returns a vector of
zeros and ones and we convert that to the logical vector sel which
defines the selection.

# read the data
Lines <- "values      ind
1    2655      7A5
2    3028      7A5
3     689   ABBA-1
4    1336   ABBA-1
5    1560   ABBA-1
6    2820   ABLIM1
7    3339   ABLIM1
8     171    ACSM5
9     195    ACSM5
10     43 ADAMDEC1
11    129 ADAMDEC1
12   1105     AFF1
13   3202     AFF1
14    852     AFF3
15   2461     AFF3
16     45     AKT1
17    397     AKT1
18   1430     AQP2
19   2402     AQP2
20   2551 ARHGAP19"
DF <- read.table(textConnection(Lines), header = TRUE)

between <- function(x, min, max) x > min & max > x

sel <- ave(DF$values, DF$ind, FUN = function(v)
	between(c(FALSE, diff(v)), 200, 300) | between(c(diff(v), FALSE), 200, 300)
) > 0

DF[sel, ]



On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:
>
> Hello list
>
> I have a problem with a dataset (see toy example below) where I am trying to find the difference between two (or more numbers) and discard those observations which fall outside a set interval.
>
> An example and further explanation:
>
>   values      ind
> 1    2655      7A5
> 2    3028      7A5
> 3     689   ABBA-1
> 4    1336   ABBA-1
> 5    1560   ABBA-1
> 6    2820   ABLIM1
> 7    3339   ABLIM1
> 8     171    ACSM5
> 9     195    ACSM5
> 10     43 ADAMDEC1
> 11    129 ADAMDEC1
> 12   1105     AFF1
> 13   3202     AFF1
> 14    852     AFF3
> 15   2461     AFF3
> 16     45     AKT1
> 17    397     AKT1
> 18   1430     AQP2
> 19   2402     AQP2
> 20   2551 ARHGAP19
>
> Each number in the values column above is associated with a label (in the ind column). For some inds there will be only 2 values but as can be seen from the data other inds have many values.
>
> Here's what I want to do using the ABBA-1 data from above as an example:
>
> calculate the differences between each value:
>
> 1560-1336 = 224
> 1336-689 = 647
>
> then use these values to create an index that will allow me to pull out values between set limits. If I set the limits to between 200 and 300 then the index will reference rows 4 & 5 in the above data set.
>
> I hope this is reasonably clear and I appreciate any suggestions.
>
> Thanks
>
> Iain
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list