[BioC] Mapping genomic coordinates to transcript coordinates? (revived)
Chris Fields
cjfields at illinois.edu
Thu Mar 3 15:45:30 CET 2011
On Mar 3, 2011, at 1:58 AM, Pages, Herve wrote:
> Hi Chris, Malcolm,
>
> There is the transcriptLocs2refLocs() function in Biostrings that
> does the reverse mapping i.e. it maps transcript coordinates to
> genomic coordinates. There is no doubt that the GenomicFeatures
> package would be a better place for this function so we should move
> it there.
... <apologies, excised the very useful code for easier reading> ...
> It's vectorized and fast (implemented in C).
Nice!
> Unfortunately we don't have a refLocs2transcriptLocs() function at
> the moment for going the other way around but, yes, that's something
> we should definitely have. When called on the previous result and with
> the same 'exonStarts', 'exonEnds' and 'strand' values, it should return
> the original 'tlocs'.
>
> There would be 2 complications for such a refLocs2transcriptLocs though:
>
> 1. If the genomic location doesn't hit the transcript. Not a big deal,
> NA could be used for this.
Agreed.
> 2. Sometimes (very rarely) the genomic location hits an ambiguous
> location on the transcript (e.g. for a small number of transcripts
> in UCSC knownGene track, some exons overlap). What to do then?
I suppose we would need examples of this, at least for documenting in the future. As for what to do, not sure myself beyond issuing a warning about the ambiguity and returning the first or last value (or have an argument indicating what to do under such circumstances, such as allow a user-defined function pick the value, etc).
> Also those 2 functions should really be in GenomicFeatures, not
> in Biostrings, and their interface should be modernized to accept
> a GRangesList object instead of exonStarts, exonEnds and strand
> (the transcriptLocs2refLocs() function predates the GenomicRanges
> era).
I agree. I wouldn't think to find this in Biostrings.
> Here in Seattle we didn't work on this yet because of lack of time
> and also because there was apparently no demand for it so far. For
> now, I'm just going to move transcriptLocs2refLocs() to GenomicFeatures
> so it's more visible and it will also make it easier for someone
> interested to contribute.
>
> H.
Seems to be the way things are implemented in any OS project, someone has an itch to scratch.
chris
More information about the Bioconductor
mailing list