[BioC] reducing hits from countGenomicOverlaps()

Martin Morgan mtmorgan at fhcrc.org
Thu Oct 27 00:07:01 CEST 2011


On 10/25/2011 05:35 PM, Robert Castelo wrote:
> dear list,
>
> the following three lines allow one to count overlaps of aligned
> short-reads with annotations:
>
> aln <- readGappedAlignments("somebamfile.bam")
> txdb <- makeTranscriptFromUCSC(genome="hg19", tablename="ensGene")
> ensGenes <- exonsBy(txdb, by="gene")
> ov <- countGenomicOverlaps(aln, ensGenes)
>
> then i want to get read-counts per gene and the first thing that comes
> to my head is doing:
>
> counts <- sapply(ov, function(x) sum(values(x)[["hits"]]))
>
> which goes through every gene and adds up the "hits" of its exons.
> however, this latter step of "just adding" takes longer than the actual
> calculation of the hits with countGenomicOverlaps() and i guess that
> there are more efficient ways to approach this, probably something
> around "reducing the hits value column". i've been looking at rdapply()
> and reduce() and googled too, but couldn't find anything, so i look
> forward to your suggestions.

Hi Robert -- A strategy here is along the lines (untested!) of

   hits <- values(unlist(ov)))[["hits"]]
   genes <- rep(names(ov), elementLengths(ov))
   counts <- sapply(split(hits, genes), sum)

but you'll want to make sure that this is a conceptually sensible way of 
counting hits per gene.

The countGenomicOverlaps function has been heavily updated to 
summarizeOverlaps in 'devel'; you might want to check that out, or at 
least the vignette for summarizeOverlaps at

http://bioconductor.org/packages/devel/bioc/html/GenomicRanges.html

Martin

>
> thanks!!
> robert.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list