[BioC] Getting Introns Expression at a Per Gene Level
Valerie Obenchain
vobencha at fhcrc.org
Mon Feb 4 21:46:50 CET 2013
The error is saying the 'reads' argument cannot be a GRanges object. The
reason for this is because GRanges does not keep track of read gaps.
When counting with summarizeOverlaps we want to consider the read gaps
so we use the GappedAlignments and GappedAlignmentPairs objects which
are 'gap-aware' and store the CIGARs. The summarizeOverlaps man page
indicates what are valid objects for the 'reads' argument,
?summarizeOverlaps
The reads can also be in a Bam file. Read more about that method at
library(Rsamtools)
library(GenomicRanges)
?'summarizeOverlaps,GRanges,BamFileList-method'
The countOverlaps (and findOverlaps) functions will double count reads,
summarizeOverlaps does not. See the example on the
?summarizeOverlaps
man page for a comparison between findOverlaps and summarizeOverlaps.
The man page also describes the different 'modes' in summarizeOverlaps
that resolve reads that do hit multiple annotation features.
Valerie
On 02/04/2013 10:35 AM, Fong Chun Chan wrote:
> Hi Valarie,
>
> Thanks for the reply. I was testing this and it doesn't seem to work
> with the summarizeOverlaps() function. When I try it, I get this error:
>
> countsForIntrons <- summarizeOverlaps(intronsByGene, reads);
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function ‘summarizeOverlaps’
> for signature ‘"GRangesList", "GRanges"’
>
> Is it supposed to be countOverlaps() instead of summarizeOverlap()?
>
> Thanks,
>
> Fong
>
>
> On Wed, Jan 30, 2013 at 12:23 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
> Hi Fong,
>
> We don't have a single function call to get introns by gene but you
> can put one together in a few steps.
>
> library(TxDb.Hsapiens.UCSC.__hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.__knownGene
>
> Specifying use.names = TRUE puts the transcript names on the
> GRangesList.
>
> introns <- intronsByTranscript(txdb, use.names=TRUE)
>
> Unlist and remove duplicate introns.
>
> ulst <- unlist(introns)
> intronsNoDups <- ulst[!duplicated(ulst)]
>
> The select() function can map txid to geneid (and others). See ?select.
>
> txnames <- names(intronsNoDups)
> map <- select(txdb, keys=txnames, keytype='TXNAME', cols='GENEID')
>
> Not all transcripts are associated with a gene id and some are
> associated with more than one.
>
> > head(map, 10)
> TXNAME GENEID
> 1 uc001aaa.3 <NA>
> 2 uc001aaa.3 <NA>
> 3 uc010nxq.1 <NA>
> 4 uc010nxq.1 <NA>
> 5 uc010nxr.1 <NA>
> 6 uc010nxr.1 <NA>
> 7 uc009vjk.2 100133331
> 8 uc009vjk.2 100133331
> 9 uc001aau.3 100132287
> 10 uc021oeh.1 100133331
>
>
> To get a GRangesList of introns by gene, split the introns on the
> associated gene id.
>
> idx <- map$GENEID[!is.na <http://is.na>(map$GENEID)]
> intronsByGene <- split(intronsNoDups[!is.na
> <http://is.na>(__map$GENEID)], idx)
> names(intronsByGene) <- unique(idx)
>
>
> The list names are geneid and the element rownames are txid.
>
> > intronsByGene
> GRangesList of length 19784:
> $100133331
> GRanges with 13 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> uc002qsd.4 chr19 [58858396, 58858718] -
> uc002qsd.4 chr19 [58859007, 58861735] -
> uc002qsd.4 chr19 [58862018, 58862756] -
> ...
>
> $100132287
> GRanges with 1 range and 0 metadata columns:
> seqnames ranges strand
> uc003wyw.1 chr8 [18248856, 18257507] +
>
>
> To count reads against this annotation you can use
> summarizeOverlaps(). The 'reads' can be Bam files, GappedAlignments
> or GappedAlignmentPairs. The 'features' argument will be the
> GRangesList of introns by gene. The result will be counts per gene.
>
> To count just introns with no grouping you can still use
> summarizeOverlaps() but use the 'intronsNoDups' GRanges object as
> 'features'.
>
>
> Valerie
>
>
>
> On 01/29/2013 11:38 AM, Fong Chun Chan wrote:
>
> Hi,
>
> I am using the R package, Genomic Features, to get Intron
> Expression and
> the function that I am using is the intronsByTranscript(). While
> this
> function is useful, it returns the number of raw reads that
> align to each
> intron grouped at the transcript level. Is there an easy way to
> get it to
> group it by a gene such that I am grabbing all the introns that
> fall in all
> the transcripts belong to a gene (and without double counting
> the intronic
> space).
>
> Similar, is there a way to easily get the raw read count at an
> individual
> intron level rather than having it grouped? The introns()
> function seems to
> be defunct now as it recommends that we use the
> intronsByTranscript()
> function.
>
> Thanks,
>
> Fong
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
More information about the Bioconductor
mailing list