[Bioc-sig-seq] countOverlaps throws error on GRangesList and GappedAlignments
Valerie Obenchain
vobencha at fhcrc.org
Mon Apr 11 17:42:24 CEST 2011
Hi Lax,
Please reply to the list when answering/asking questions. The
conversation might be helpful to someone else in the future.
The seqlevels function is not available in R-2.12/BioC 2.7 but is with
R-2.13/BioC 2.8. Version R-2.13/BioC 2.8 are being released this week
so you might want to update.
Take care,
Valerie
Hi Valerie
Thanks. Subsetting works fine. The only thing I found is that there is
NO function named "seqlevels",
I could get to the levels using the command at the individual list entries:
levels (seqnames (p3UTRsSub)[[1]])
This solves my problems, now I can iterative over the chromosomes and
find overlapping counts.
-thanks
-Lax
On 04/11/2011 06:53 AM, Valerie Obenchain wrote:
> Hi Lax,
>
> On 04/10/11 08:13, Lakshmanan Iyer wrote:
>> Hello,
>> I created a GRangesList of 3'UTRs and GappedAlignments of a BAM file
>> using
>> the following code:
>> _____________________
>> library(GenomicRanges);
>> library(Rsamtools);
>> library(leeBamViews)
>> library ("GenomicFeatures")
>> txdb<- makeTranscriptDbFromUCSC(genome="mm9", tablename="knownGene")
>> #
>> # It remembers the transcript ids. so can be used later
>> p3UTRs<- threeUTRsByTranscript(txdb, use.names=TRUE)
>> # read the bam alignment
>> aligns<- readGappedAlignments ("Dend_accepted_hits_sorted.bam")
>>
>> --------------------------------------
>> However, when I try to get the countOverlaps, I get the following error:
>>> c<- countOverlaps (p3UTRs, aligns)
>> Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, type =
>> type)) :
>> error in evaluating the argument 'x' in selecting a method for
>> function
>> 'queryHits'
>>
>> I suspect that this happens as the sequence names, "chrUn_random
>> chrX_random
>> chrY_random" etc which are present in the GRangeslist, p3UTRs are not
>> present in the GappedAlignement objects.
>>
>> My questions is:
>> 1. Can I make countOverlaps to choose only those intervals in which
>> there
>> are entries in both the objects?
> No. The general approach is to clean the subject and query before
> calling countOverlaps.
>> 2. How do I subjset GRangeList (p3UTRs) to include entries from
>> specific
>> list of chromosomes of interest only?
> To retain specific chromosomes, you can subset with seqnames then
> reset the seqlevels.
>
> In R-2.13/BioC 2.8, using the grl object from the GRangesList man page :
>
> > grl
> GRangesList of length 3
> $gr1
> GRanges with 1 range and 2 elementMetadata values
> seqnames ranges strand | score GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1] chr2 [3, 6] + | 5 0.45
>
> $gr2
> GRanges with 2 ranges and 2 elementMetadata values
> seqnames ranges strand | score GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1] chr1 [ 7, 9] + | 3 0.3
> [2] chr1 [13, 15] - | 4 0.5
>
> $gr3
> GRanges with 2 ranges and 2 elementMetadata values
> seqnames ranges strand | score GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1] chr1 [1, 3] - | 6 0.4
> [2] chr2 [4, 9] - | 2 0.1
>
>
> seqlengths
> chr2 chr1
> NA NA
>
> grlsub <- grl[seqnames(grl) == "chr1", ]
> seqlevels(grlsub) <- seqlevels(grlsub)[seqlevels(grlsub) == "chr1"]
>
> You can remove the empty list elements with,
>
> grlsub <- grlsub[elementLengths(grlsub) >0]
>
>
> > grlsub
> GRangesList of length 2
> $gr2
> GRanges with 2 ranges and 2 elementMetadata values
> seqnames ranges strand | score GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1] chr1 [ 7, 9] + | 3 0.3
> [2] chr1 [13, 15] - | 4 0.5
>
> $gr3
> GRanges with 1 range and 2 elementMetadata values
> seqnames ranges strand | score GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1] chr1 [1, 3] - | 6 0.4
>
>
> seqlengths
> chr1
> NA
>
>
> Valerie
>
>
> > sessionInfo()
> R version 2.13.0 beta (2011-03-30 r55199)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicRanges_1.3.37 IRanges_1.9.29
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.0
>
>
>
>
>> -best
>> -Lax
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.2.1 leeBamViews_0.99.11 BSgenome_1.18.2
>> [4] Biobase_2.10.0 Rsamtools_1.2.0 Biostrings_2.18.0
>> [7] GenomicRanges_1.2.3 IRanges_1.8.2
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.6.0 DBI_0.2-5 RCurl_1.4-3
>> RSQLite_0.9-2
>> [5] rtracklayer_1.10.5 tools_2.12.0 XML_3.2-0
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list