[BioC] using a gtf file to map reads

Valerie Obenchain vobencha at fhcrc.org
Mon Jul 1 03:07:03 CEST 2013


Mike and Alejandro,

disjointExons() is now in GenomicFeatures 1.13.18. It's documented on 
the same man page as ?transcripts, ?exons, ?cds ,etc.

The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a 
few performance modifications. Primary changes were a helper function 
with a more vectorized approach to create 'geneNames', 'transcripts' and 
reducing the number of times metadata columns were added to the GRanges.

 > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
 > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
    user  system elapsed
   8.205   0.028   8.397
 > system.time(disjointExons(txdb, TRUE, TRUE))
    user  system elapsed
   5.596   0.032   5.764

 > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
 > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
    user  system elapsed
  37.286   0.228  37.954
 > system.time(disjointExons(txdb, TRUE, TRUE))
    user  system elapsed
  28.130   0.228  28.822

The multiple requests on the mailing list for a function of this type 
made me think it should be included in one of the infrastructure 
packages. The goal of adding this to GenomicFeatures was to increase 
visibility and make it more readily available to packages that don't 
depend on DEXSeq. I hope having the function in two places doesn't add a 
layer of confusion. I have a suite of unit tests to ensure disjointExons 
and prepareAnnotationForDEXSeq are in sync.

Let me know if you have any questions or would like any changes made to 
the man page.

Thanks,
Valerie



On 06/11/13 12:30, Valerie Obenchain wrote:
> Great. I'll go ahead and put this in GenomicFeatures with credit to you
> and Alejandro.
>
> Thanks.
> Valerie
>
> On 06/11/2013 09:39 AM, Michael Love wrote:
>> hi Valerie,
>>
>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain
>> <vobencha at fhcrc.org> wrote:
>>> Hi Mike (Love),
>>>
>>> Would you be interested in contributing a disjointExons() to
>>> GenomicFeatures? I think this extraction would be useful to many.
>>> Maybe we
>>> also want a disjointExonsBy() but the only 'by' would be genes ...?
>>>
>>> Valerie
>>>
>>
>> I'd be happy to contribute any of this code in parathyroidSE.
>> Alejandro Reyes has formalized this part of the code into the
>> following function in DEXSeq >= 1.6.0:
>>
>>> prepareAnnotationForDEXSeq
>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts =
>> TRUE)
>> {
>>      stopifnot(is(transcriptDb, "TranscriptDb"))
>>      exonsByGene <- exonsBy(transcriptDb, by = "gene")
>>      exonicParts <- disjoin(unlist(exonsByGene))
>>      if (!aggregateGenes) {
>>          overlaps <- findOverlaps(exonicParts, exonsByGene)
>>          geneNames <- names(exonsByGene)[subjectHits(overlaps)]
>> .....
>>
>> best,
>>
>> Mike
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list