[R] help with column substaction with a twist

Greg Snow 538280 at gmail.com
Mon Jul 21 20:58:00 CEST 2014


Here is another approach in R (blatantly stealing Jim Holtman's code
to generate sample data):

> set.seed(1)
> n <- 100
> test <- data.frame(p = sample(10, n, TRUE)
+                 , b = sample(10, n, TRUE)
+                 )
> test$e <- sample(5, n, TRUE) + test$b  # make sure e > b
>
> tmp1 <- test$b - test$p
> tmp2 <- test$p - test$e
>
> test$dist <- pmax( tmp1, tmp2, 0 ) * sign( -tmp1 )
>
> head(test, 10)
    p  b  e dist
1   3  7  9   -4
2   4  4  6    0
3   6  3  6    0
4  10 10 12    0
5   3  7  8   -4
6   9  3  6    3
7  10  2  5    5
8   7  5  6    1
9   7 10 12   -3
10  1  6 10   -5
>

You could also skip the 2 temporary variables and just code the
differences inside the pmax and sign functions (using tmp1 will mean
not having to do the same subtraction twice).  If you are happy with
the absolute difference then you can drop the "* sign( -tmp1 )" part.

This works because if p is less than b then tmp1 will be positive and
tmp2 will be negative.  If p is between b and e then both will be
negative (and therefore 0 will be greater than both). If p is greater
than e then tmp2 will be positive and tmp1 negative.  So the maximum
value (pmax computes this for each row/pair/triplet) will be the one
of interest.


On Sun, Jul 20, 2014 at 5:33 AM, Ubernerdy <ubernerdy at gmail.com> wrote:
> Hello guys!
>
> I have been messing around with R for a while now, but this situation has
> me a bit stumped. I was unable to solve it by reading documentation.
>
> So I have this table (currently in Excel - could export it as csv) with
> values in 3 columns. Let's call them value P (for position), value B
> (beginning) and E (end). P represents the position of a mutation in the
> genome, B and E are the beginnings and ends of a nearby gene that either
> contains the mutation or not.
>
> I am trying to compute the distance between the mutation and the gene.
>
> If the mutation is contained in the gene, that is value P is greater than B
> and lesser than E, the result is 0.
>
> If the mutation is "left" of the gene, the distance is negative and is
> equal to P-B.
>
> If the mutation is "right" of the gene, the distance is positive and is
> equal to P-E.
>
> How would i achieve this in R?
>
> Regards and thanks, S.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com



More information about the R-help mailing list