[BioC] rtracklayer and UCSC

Keith Satterley keith at wehi.EDU.AU
Fri May 15 02:23:42 CEST 2009


My understanding of UCSC co-ordinates is, as Sean says, zero based and one 
based. However I have stopped using the word "start" and "end" with UCSC 
co-ordinates. I believe it would be better to use "left" and "right".

The UCSC data definitions of their annotation files, see:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refGene.sql

use txStart/txEnd, cdsStart/cdsEnd, exonStarts/exonEnds. However these 
co-ordinates are only start and end co-ordinates for positive strand genes. They 
are end and start co-ordinates for negative strand genes, assuming that start 
means the 5 prime end of a gene.

I think it is more accurate to say that LEFT end UCSC co-ordinates are zero 
based and RIGHT end UCSC co-ordinates are one based.

However note that whenever UCSC are displaying co-ordinates to GUI users, they 
adjust left end co-ordinates back to being one based. If I remember correctly, 
if you use the DNA option in the UCSC browser to get DNA bases, the co-ordinates 
are all still one based, but as stated, if you download the annotation files, 
such as refGene.txt, from the above link, the left co-ordinates are zero based.

I don't know how rtracklayer handles this issue.

cheers,

Keith

Sean Davis wrote:
> On Thu, May 14, 2009 at 7:29 PM, Kasper Daniel Hansen <
> khansen at stat.berkeley.edu> wrote:
> 
>> As far as I know USCS uses zero-based indexing of their genomes, R uses
>> 1-based. What kind of conversion is being used by rtracklayer - I suspect
>> none at all? It might be worthwhile to add a discussion about this somewhere
>> in the vignette?
> 
> 
> It is even slightly more complicated than that.  They use zero-based starts
> and 1-based ends, except for graphical display:
> 
> http://genome.ucsc.edu/FAQ/FAQtracks#tracks1
> 
> Sean
> 
> 
>>
>> More specifically, I have downloaded a couple of tables from UCSC using
>> rtracklayer and I wanted to know if I need to add 1 to the column named
>> exonStart (after a suitable splitting - it is a comma separated character
>> list).
>>
>> Kasper
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list