[BioC] using a gtf file to map reads
Valerie Obenchain
vobencha at fhcrc.org
Mon Jul 1 03:07:03 CEST 2013
Mike and Alejandro,
disjointExons() is now in GenomicFeatures 1.13.18. It's documented on
the same man page as ?transcripts, ?exons, ?cds ,etc.
The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a
few performance modifications. Primary changes were a helper function
with a more vectorized approach to create 'geneNames', 'transcripts' and
reducing the number of times metadata columns were added to the GRanges.
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
user system elapsed
8.205 0.028 8.397
> system.time(disjointExons(txdb, TRUE, TRUE))
user system elapsed
5.596 0.032 5.764
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
user system elapsed
37.286 0.228 37.954
> system.time(disjointExons(txdb, TRUE, TRUE))
user system elapsed
28.130 0.228 28.822
The multiple requests on the mailing list for a function of this type
made me think it should be included in one of the infrastructure
packages. The goal of adding this to GenomicFeatures was to increase
visibility and make it more readily available to packages that don't
depend on DEXSeq. I hope having the function in two places doesn't add a
layer of confusion. I have a suite of unit tests to ensure disjointExons
and prepareAnnotationForDEXSeq are in sync.
Let me know if you have any questions or would like any changes made to
the man page.
Thanks,
Valerie
On 06/11/13 12:30, Valerie Obenchain wrote:
> Great. I'll go ahead and put this in GenomicFeatures with credit to you
> and Alejandro.
>
> Thanks.
> Valerie
>
> On 06/11/2013 09:39 AM, Michael Love wrote:
>> hi Valerie,
>>
>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain
>> <vobencha at fhcrc.org> wrote:
>>> Hi Mike (Love),
>>>
>>> Would you be interested in contributing a disjointExons() to
>>> GenomicFeatures? I think this extraction would be useful to many.
>>> Maybe we
>>> also want a disjointExonsBy() but the only 'by' would be genes ...?
>>>
>>> Valerie
>>>
>>
>> I'd be happy to contribute any of this code in parathyroidSE.
>> Alejandro Reyes has formalized this part of the code into the
>> following function in DEXSeq >= 1.6.0:
>>
>>> prepareAnnotationForDEXSeq
>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts =
>> TRUE)
>> {
>> stopifnot(is(transcriptDb, "TranscriptDb"))
>> exonsByGene <- exonsBy(transcriptDb, by = "gene")
>> exonicParts <- disjoin(unlist(exonsByGene))
>> if (!aggregateGenes) {
>> overlaps <- findOverlaps(exonicParts, exonsByGene)
>> geneNames <- names(exonsByGene)[subjectHits(overlaps)]
>> .....
>>
>> best,
>>
>> Mike
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list