[BioC] ensembl annotation coordinate did not match that from UCSC genome browser using ucscTableQuery
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Mar 4 17:07:28 CET 2010
Hi,
On Thu, Mar 4, 2010 at 10:41 AM, sabrina s <sabrina.shao at gmail.com> wrote:
> Hi, all
> I don't know if it is just by chance, I was retrieving sequence for
> ENSMUST00000027587<http://www.ensembl.org/Mus_musculus/Transcript/Exons?g=ENSMUSG00000026349;t=ENSMUST00000027587>using
> BSgenome
> the coordinate I use was what I retrieved from UCSC through following code:
>
> library(rtracklayer)
> session <- browserSession()
> genome(session) <- "mm9"
>
> q2<- ucscTableQuery(session,"
> ensGene")
> ensGene<-getTable(q2)
>
> the result is:
> name name2 chrom strand txStart txEnd
> 980 NM_028399 Ccnt2 chr1 + 129670740 129701414
>
> exonStarts
> 980
> 129670740,129671677,129688181,129689934,129691831,129694417,129695966,129698182,129698738,
>
> exonEnds exonCount
> 980
> 129670962,129671759,129688310,129689995,129691894,129694463,129696130,129698253,129701414,
> 9
>
>
> But from Ensembl or even UCSC genome browser, the first exon coordinate
> starts at 129670741, so there is 1 bp shift.
Look at the description of how the "coordinates" work as supplied by UCSC:
http://genome.ucsc.edu/FAQ/FAQtracks#tracks1
> Because of that, I can't get
> the right sequence that I need. So there is anyway to correct that or am I
> missing some steps? Thanks!
You can get what you need, you just hat to know when you need to add
or subtract 1 from the start position.
Hope that helps,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list