[Bioc-sig-seq] countOverlaps throws error on GRangesList and GappedAlignments
Valerie Obenchain
vobencha at fhcrc.org
Mon Apr 11 15:53:38 CEST 2011
Hi Lax,
On 04/10/11 08:13, Lakshmanan Iyer wrote:
> Hello,
> I created a GRangesList of 3'UTRs and GappedAlignments of a BAM file using
> the following code:
> _____________________
> library(GenomicRanges);
> library(Rsamtools);
> library(leeBamViews)
> library ("GenomicFeatures")
> txdb<- makeTranscriptDbFromUCSC(genome="mm9", tablename="knownGene")
> #
> # It remembers the transcript ids. so can be used later
> p3UTRs<- threeUTRsByTranscript(txdb, use.names=TRUE)
> # read the bam alignment
> aligns<- readGappedAlignments ("Dend_accepted_hits_sorted.bam")
>
> --------------------------------------
> However, when I try to get the countOverlaps, I get the following error:
>
>> c<- countOverlaps (p3UTRs, aligns)
>>
> Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, type =
> type)) :
> error in evaluating the argument 'x' in selecting a method for function
> 'queryHits'
>
> I suspect that this happens as the sequence names, "chrUn_random chrX_random
> chrY_random" etc which are present in the GRangeslist, p3UTRs are not
> present in the GappedAlignement objects.
>
> My questions is:
> 1. Can I make countOverlaps to choose only those intervals in which there
> are entries in both the objects?
>
No. The general approach is to clean the subject and query before
calling countOverlaps.
> 2. How do I subjset GRangeList (p3UTRs) to include entries from specific
> list of chromosomes of interest only?
>
To retain specific chromosomes, you can subset with seqnames then reset
the seqlevels.
In R-2.13/BioC 2.8, using the grl object from the GRangesList man page :
> grl
GRangesList of length 3
$gr1
GRanges with 1 range and 2 elementMetadata values
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr2 [3, 6] + | 5 0.45
$gr2
GRanges with 2 ranges and 2 elementMetadata values
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr1 [ 7, 9] + | 3 0.3
[2] chr1 [13, 15] - | 4 0.5
$gr3
GRanges with 2 ranges and 2 elementMetadata values
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr1 [1, 3] - | 6 0.4
[2] chr2 [4, 9] - | 2 0.1
seqlengths
chr2 chr1
NA NA
grlsub <- grl[seqnames(grl) == "chr1", ]
seqlevels(grlsub) <- seqlevels(grlsub)[seqlevels(grlsub) == "chr1"]
You can remove the empty list elements with,
grlsub <- grlsub[elementLengths(grlsub) >0]
> grlsub
GRangesList of length 2
$gr2
GRanges with 2 ranges and 2 elementMetadata values
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr1 [ 7, 9] + | 3 0.3
[2] chr1 [13, 15] - | 4 0.5
$gr3
GRanges with 1 range and 2 elementMetadata values
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr1 [1, 3] - | 6 0.4
seqlengths
chr1
NA
Valerie
> sessionInfo()
R version 2.13.0 beta (2011-03-30 r55199)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_1.3.37 IRanges_1.9.29
loaded via a namespace (and not attached):
[1] tools_2.13.0
> -best
> -Lax
>
>> sessionInfo()
>>
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicFeatures_1.2.1 leeBamViews_0.99.11 BSgenome_1.18.2
> [4] Biobase_2.10.0 Rsamtools_1.2.0 Biostrings_2.18.0
> [7] GenomicRanges_1.2.3 IRanges_1.8.2
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.6.0 DBI_0.2-5 RCurl_1.4-3
> RSQLite_0.9-2
> [5] rtracklayer_1.10.5 tools_2.12.0 XML_3.2-0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list