[Bioc-sig-seq] countOverlaps throws error on GRangesList and GappedAlignments

Valerie Obenchain vobencha at fhcrc.org
Mon Apr 11 17:42:24 CEST 2011


Hi Lax,

Please reply to the list when answering/asking questions. The 
conversation might be helpful to someone else in the future.

The seqlevels function is not available in  R-2.12/BioC 2.7 but is with 
R-2.13/BioC 2.8.  Version R-2.13/BioC 2.8 are being released this week 
so you might want to update.

Take care,
Valerie



Hi Valerie
Thanks. Subsetting works fine. The only thing I found is that there is 
NO function named "seqlevels",

I could get to the levels using the command at the individual list entries:
levels (seqnames (p3UTRsSub)[[1]])

This solves my problems, now I can iterative over the chromosomes and 
find overlapping counts.
-thanks
-Lax


On 04/11/2011 06:53 AM, Valerie Obenchain wrote:
> Hi Lax,
>
> On 04/10/11 08:13, Lakshmanan Iyer wrote:
>> Hello,
>> I created a GRangesList of 3'UTRs and GappedAlignments of a BAM file 
>> using
>> the following code:
>> _____________________
>> library(GenomicRanges);
>>   library(Rsamtools);
>>   library(leeBamViews)
>> library ("GenomicFeatures")
>> txdb<- makeTranscriptDbFromUCSC(genome="mm9", tablename="knownGene")
>> #
>> # It remembers the transcript ids. so can be used later
>> p3UTRs<- threeUTRsByTranscript(txdb, use.names=TRUE)
>> # read the bam alignment
>> aligns<- readGappedAlignments ("Dend_accepted_hits_sorted.bam")
>>
>> --------------------------------------
>> However, when I try to get the countOverlaps, I get the following error:
>>> c<- countOverlaps (p3UTRs, aligns)
>> Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, type =
>> type)) :
>>    error in evaluating the argument 'x' in selecting a method for 
>> function
>> 'queryHits'
>>
>> I suspect that this happens as the sequence names, "chrUn_random 
>> chrX_random
>> chrY_random" etc which are present in the GRangeslist, p3UTRs are not
>> present in the GappedAlignement objects.
>>
>> My questions is:
>> 1. Can I make countOverlaps to choose only those intervals in which 
>> there
>> are entries in both the objects?
> No.  The general approach is to clean the subject and query before 
> calling countOverlaps.
>>   2. How do I subjset GRangeList (p3UTRs) to include entries from 
>> specific
>> list of chromosomes of interest only?
> To retain specific chromosomes, you can subset with seqnames then 
> reset the seqlevels.
>
> In R-2.13/BioC 2.8, using the grl object from the GRangesList man page :
>
> > grl
> GRangesList of length 3
> $gr1
> GRanges with 1 range and 2 elementMetadata values
>     seqnames    ranges strand |     score        GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1]     chr2    [3, 6]      + |         5      0.45
>
> $gr2
> GRanges with 2 ranges and 2 elementMetadata values
>     seqnames    ranges strand |     score        GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1]     chr1  [ 7,  9]      + |         3       0.3
> [2]     chr1  [13, 15]      - |         4       0.5
>
> $gr3
> GRanges with 2 ranges and 2 elementMetadata values
>     seqnames    ranges strand |     score        GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1]     chr1    [1, 3]      - |         6       0.4
> [2]     chr2    [4, 9]      - |         2       0.1
>
>
> seqlengths
>  chr2 chr1
>    NA   NA
>
>     grlsub <- grl[seqnames(grl) == "chr1", ]
>     seqlevels(grlsub) <- seqlevels(grlsub)[seqlevels(grlsub) == "chr1"]
>
> You can remove the empty list elements with,
>
>     grlsub <- grlsub[elementLengths(grlsub) >0]
>
>
> > grlsub
> GRangesList of length 2
> $gr2
> GRanges with 2 ranges and 2 elementMetadata values
>     seqnames    ranges strand |     score        GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1]     chr1  [ 7,  9]      + |         3       0.3
> [2]     chr1  [13, 15]      - |         4       0.5
>
> $gr3
> GRanges with 1 range and 2 elementMetadata values
>     seqnames    ranges strand |     score        GC
> <Rle> <IRanges> <Rle> | <integer> <numeric>
> [1]     chr1    [1, 3]      - |         6       0.4
>
>
> seqlengths
>  chr1
>    NA
>
>
> Valerie
>
>
> > sessionInfo()
> R version 2.13.0 beta (2011-03-30 r55199)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicRanges_1.3.37 IRanges_1.9.29
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.0
>
>
>
>
>> -best
>> -Lax
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.2.1 leeBamViews_0.99.11   BSgenome_1.18.2
>> [4] Biobase_2.10.0        Rsamtools_1.2.0       Biostrings_2.18.0
>> [7] GenomicRanges_1.2.3   IRanges_1.8.2
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.6.0      DBI_0.2-5          RCurl_1.4-3
>> RSQLite_0.9-2
>> [5] rtracklayer_1.10.5 tools_2.12.0       XML_3.2-0
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list