[Bioc-sig-seq] countOverlaps throws error on GRangesList and GappedAlignments

Valerie Obenchain vobencha at fhcrc.org
Mon Apr 11 15:53:38 CEST 2011


Hi Lax,

On 04/10/11 08:13, Lakshmanan Iyer wrote:
> Hello,
> I created a GRangesList of 3'UTRs and GappedAlignments of a BAM file using
> the following code:
> _____________________
> library(GenomicRanges);
>   library(Rsamtools);
>   library(leeBamViews)
> library ("GenomicFeatures")
> txdb<- makeTranscriptDbFromUCSC(genome="mm9", tablename="knownGene")
> #
> # It remembers the transcript ids. so can be used later
> p3UTRs<- threeUTRsByTranscript(txdb, use.names=TRUE)
> # read the bam alignment
> aligns<- readGappedAlignments ("Dend_accepted_hits_sorted.bam")
>
> --------------------------------------
> However, when I try to get the countOverlaps, I get the following error:
>    
>> c<- countOverlaps (p3UTRs, aligns)
>>      
> Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, type =
> type)) :
>    error in evaluating the argument 'x' in selecting a method for function
> 'queryHits'
>
> I suspect that this happens as the sequence names, "chrUn_random chrX_random
> chrY_random" etc which are present in the GRangeslist, p3UTRs are not
> present in the GappedAlignement objects.
>
> My questions is:
> 1. Can I make countOverlaps to choose only those intervals in which there
> are entries in both the objects?
>    
No.  The general approach is to clean the subject and query before 
calling countOverlaps.
>   2. How do I subjset GRangeList (p3UTRs) to include entries from specific
> list of chromosomes of interest only?
>    
To retain specific chromosomes, you can subset with seqnames then reset 
the seqlevels.

In R-2.13/BioC 2.8, using the grl object from the GRangesList man page :

 > grl
GRangesList of length 3
$gr1
GRanges with 1 range and 2 elementMetadata values
     seqnames    ranges strand |     score        GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1]     chr2    [3, 6]      + |         5      0.45

$gr2
GRanges with 2 ranges and 2 elementMetadata values
     seqnames    ranges strand |     score        GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1]     chr1  [ 7,  9]      + |         3       0.3
[2]     chr1  [13, 15]      - |         4       0.5

$gr3
GRanges with 2 ranges and 2 elementMetadata values
     seqnames    ranges strand |     score        GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1]     chr1    [1, 3]      - |         6       0.4
[2]     chr2    [4, 9]      - |         2       0.1


seqlengths
  chr2 chr1
    NA   NA

     grlsub <- grl[seqnames(grl) == "chr1", ]
     seqlevels(grlsub) <- seqlevels(grlsub)[seqlevels(grlsub) == "chr1"]

You can remove the empty list elements with,

     grlsub <- grlsub[elementLengths(grlsub) >0]


 > grlsub
GRangesList of length 2
$gr2
GRanges with 2 ranges and 2 elementMetadata values
     seqnames    ranges strand |     score        GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1]     chr1  [ 7,  9]      + |         3       0.3
[2]     chr1  [13, 15]      - |         4       0.5

$gr3
GRanges with 1 range and 2 elementMetadata values
     seqnames    ranges strand |     score        GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1]     chr1    [1, 3]      - |         6       0.4


seqlengths
  chr1
    NA


Valerie


 > sessionInfo()
R version 2.13.0 beta (2011-03-30 r55199)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_1.3.37 IRanges_1.9.29

loaded via a namespace (and not attached):
[1] tools_2.13.0




> -best
> -Lax
>    
>> sessionInfo()
>>      
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicFeatures_1.2.1 leeBamViews_0.99.11   BSgenome_1.18.2
> [4] Biobase_2.10.0        Rsamtools_1.2.0       Biostrings_2.18.0
> [7] GenomicRanges_1.2.3   IRanges_1.8.2
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.6.0      DBI_0.2-5          RCurl_1.4-3
> RSQLite_0.9-2
> [5] rtracklayer_1.10.5 tools_2.12.0       XML_3.2-0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list