[Bioc-sig-seq] coordinates 0-indexed or 1-indexed in IRanges?

Tue Apr 13 15:02:23 CEST 2010

On 04/13/2010 12:35 PM, margherita mutarelli wrote:
> Dear all,
>
> please apologize if I missed this information, but I have looked throughout
> the documentation and vignettes of the IRanges packages and I could not find
> this information:
>
> are the coordinates in IRanges objects considered as "0-indexed" or
> "1-indexed"?
>
> I.e. when importing the refGene.txt table (or any) from UCSC, we know that
> they are 0-indexed, meaning that the first base is not part of the
> gene/transcript/object.
> If IRanges are 1-index this means we have to subtract 1 from the start
> coordinate precedent in the table when creating an IRanges object from them.
>
> Is it correct?

Hi Margherita

this topic always causes problems. As far as I understand the situation, 
you have to add 1 to the start of the coordinates you have downloaded (I 
assume a BED files) from UCSC.

Let me try and explain with a simple example:

we have two features ranging from 1 to 5 and 5 to 10. We can create 
simple IRanges objects:

 > f1 <- IRanges(c(1), c(5))
 > f2 <- IRanges(c(5), c(10))
 >
 > f1
IRanges of length 1
     start end width
[1]     1   5     5
 > f2
IRanges of length 1
     start end width
[1]     5  10     6
 >

and of course, they do overlap:

 > findOverlaps(f1,f2)
An object of class “RangesMatching”
Slot "matchMatrix":
      query subject
[1,]     1       1

Slot "DIM":
[1] 1 1

 >

Now let's assume we got these numbers from UCSC as part of a BED file 
for S. cerevisiae, chromosome 11:

chrXI	1	5
chrXI	5	10

BED files are '0-based' and 'end exclusive' (see:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1

on the chromosome (with a '0-based' notation) this would look like

     0 1 2 3 4 5 6 7 8 9 10
     C A C C A C A C C C A
f1    * * * *
f2            * * * * *

   => they don't overlap!

play with the 'upload custom track' (using the small BED file from 
above) tool on the UCSC genome browser in case this is stil confusing

Now back to IRanges (which are '1-based'  and 'end inclusive')

     1 2 3 4 5 6 7 8 9 10
     C A C C A C A C C C A
f1    * * * *
f2            * * * * *

our new numbers are: 2 to 5 and 6 to 10  (which corresponds to adding 1 
to the start before we create the IRanges object)

 > ff1 <- IRanges(c(2), c(5))
 > ff2 <- IRanges(c(6), c(10))
 > findOverlaps(ff1,ff2)
An object of class “RangesMatching”
Slot "matchMatrix":
      query subject

Slot "DIM":
[1] 1 1

 >

=> they don't overlap.

I hope this helps

Hans

> This can be important to clarify, both when considering overlap of features
> and in junctions, since it can shift the correct exon boundaries.
>
> Cheers,
>
> Margherita
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing