[R] left end or right end

Thu Jul 1 18:51:13 CEST 2010

Hi,

On Thu, Jul 1, 2010 at 10:24 AM, ravikumar sukumar
<ravikumarsukumar at gmail.com> wrote:
> There are three possibilities:
>
> Case1: Left end
>
> P--------------
> Q--------------------------------------
>
> Case2: Right end
>
> P                        --------------
> Q--------------------------------------
>
>
> Case3: At mid position
>
> P        -------------
> A--------------------------------------
>
>
> My question is how far my data falls on the all the three cases. Is it
> biased towards case1 or case2 or case3. I have to consider the length of Q
> in the data. Example: start2-start1 =2  and end2-end1 = 3 does not make much
> difference if length of Q is 150000.
>
> I do not hypothesize, i want to know how my data goes on.

Please note that the suggestions I give below don't give you a means
of doing statistical testing of any sort, I'm just giving you ideas to
help you figure out what's going on in your data.

So:

Why not just do some simple manipulations[*] and then plot the
distribution of where all of your P's land in their respective Q's

[*] Simple Manipulations

Maybe you can ask:
How far "in" (in terms of the percent-of-Q's length) does P start

I think you previously said that you know that P is always contained
in its paired Q, so I'm going to assume this is true for simplicity:

Let's assume that you have two matrices P and Q. The rows are the
"paired" p and q elements, the columns are their start,end positions.

R> P.width <- P[,2] - P[,1] + 1
R> Q.width <- Q[,2] - Q[,1] + 1

How far INTO Q does its paired P value start?

## P[,1] is always >= 1 Q[,1]
R> P.start <- P[,1] - Q[,1]

Now let's adjust Q's width, so we can ask something like "How far
(%-wise) into Q does P land?)

R> Q.width.adjust <- Q.width - P.width

And get the "percent into Q that P starts in"

R> how.far <- P.start / Q.width

This is untested code. I'm not promising that it works, but I'm just
helping convey my idea into words. You'll likely have to debug as
appropriate.

What I'm imagining should give you (for your examples):

Case1 : 0%
Case2 : 100%
Case3 : 30% (?)

Then you can plot the density of how.far to see what's happening.

++++++++++++++++++++++++++++++++++++++++++++++++++++

Another thing you can do is to use your P to split your Q into two
segments, then plot the ratio of the length of the left segment vs.
the length of the right.

In order for this to work, I'm guessing you have to pad Q with 1
basepair (or whatever) on each side, ie:

Case1:
Originally:
  P--------------
  Q--------------------------------------

Xform case by padding +1 on either side of Q:
  P --------------
  Q----------------------------------------

Split Q with P

  Q1: -
  Q2:             --------------------------

Now take ratio:
width(Q1) / width(Q2)

Case 2:
Mirror Case 1

Case 3:
Originally:
  P        -------------
  Q--------------------------------------

Xform by padding Q
  P        -------------
  Q----------------------------------------

Split Q with P:
  Q1: --------
  Q2: -------------------

Take ratio:
  width(Q1) / width(Q2)

Plot the distribution of these ratios to see what's up. (Note that the
"width" function is something you have to define)

If you're dealing with this type of data and taking these types of
approaches, I'd suggest looking into the IRanges packages from
bioconductor, which will make working with these quite simple (after
you read through its extensive documentation, of course -- this
package *does* provide a "width" function, though ;-)

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact