[Bioc-sig-seq] countOverlaps throws error on GRangesList and GappedAlignments

Hervé Pagès hpages at fhcrc.org
Mon Apr 11 21:42:31 CEST 2011


Hi Michael,

On 11-04-11 09:31 AM, Michael Lawrence wrote:
> I've noticed some functions just issuing a warning when the SeqInfo objects
> have a different set of chromosomes. In the OP's code, this is issuing an
> error. Has this changed in devel? Is everything now a warning? I would
> recommend consistency here, and a warning seems best, stating the
> assumptions that were made.

The way this is handled has been completely revisited in BioC 2.8.

Now checking whether 2 objects with sequence information (like GRanges,
GRangesList, GappedAlignments, BSgenome) are compatible or not is the
responsibility of the "merge" method for Seqinfo objects (you can use
the seqinfo() getter to extract the Seqinfo slot from a GRanges,
GRangesList, GappedAlignments or BSgenome object).

Here first I show how merging 2 Seqinfo objects works (the end user
never does this directly but this is what happens behind the scene
when s/he tries to compare or combine 2 objects that have a Seqinfo
slot):

   library(GenomicRanges)

   x <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
                seqlengths=c(100, 200, NA, 15),
                isCircular=c(NA, FALSE, FALSE, TRUE))

   y <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
                seqlengths=c(300, NA, 15))

Then:

   > x
   Seqinfo of length 4
   seqnames seqlengths isCircular
   chr1            100         NA
   chr2            200      FALSE
   chr3             NA      FALSE
   chrM             15       TRUE

   > y
   Seqinfo of length 3
   seqnames seqlengths isCircular
   chr3            300         NA
   chr4             NA         NA
   chrM             15         NA

   > merge(x, y)
   Seqinfo of length 5
   seqnames seqlengths isCircular
   chr1            100         NA
   chr2            200      FALSE
   chr3            300      FALSE
   chrM             15       TRUE
   chr4             NA         NA
   Warning message:
   In .Seqinfo.mergexy(x, y) :
     Each of the 2 combined objects has sequence levels not in the other:
     - in 'x': chr1, chr2
     - in 'y': chr4
     Make sure to always combine/compare objects based on the same reference
     genome (use suppressWarnings() to suppress this warning).

Now if I modify 'y' in a way that *contradicts* what 'x' says about
circularity of chr3 and chrM:

   > isCircular(y)[c("chr3", "chrM")] <- c(TRUE, FALSE)
   > y
   Seqinfo of length 3
   seqnames seqlengths isCircular
   chr3            300       TRUE
   chr4             NA         NA
   chrM             15      FALSE

Then I get an error:

   > merge(x, y)
   Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = 
c("sequence",  :
     sequences chr3, chrM have incompatible circularity flags:
     - in 'x': FALSE, TRUE
     - in 'y': TRUE, FALSE
   In addition: Warning message:
   In .Seqinfo.mergexy(x, y) :
     Each of the 2 combined objects has sequence levels not in the other:
     - in 'x': chr1, chr2
     - in 'y': chr4
     Make sure to always combine/compare objects based on the same reference
     genome (use suppressWarnings() to suppress this warning).

This is because merge() has no way to reconcile the information stored
in 'x' and 'y'.

Now from an end user perspective, findOverlaps(), countOverlaps(), c(),
union(), intersect(), punion(), pintersect(), etc... on 2 GRanges
objects behave consistently with the above (they just try to merge
seqinfo(gr1) and seqinfo(gr2) before they do anything else):

   gr1 <- GRanges(Rle(factor(levels=seqlevels(x))))
   seqinfo(gr1) <- x
   gr2 <- GRanges(Rle(factor(levels=seqlevels(y))))
   seqinfo(gr2) <- y

Then:

   > gr1
   GRanges with 0 ranges and 0 elementMetadata values
        seqnames ranges strand |

   seqlengths
    chr1 chr2 chr3 chrM
     100  200   NA   15

   > gr2
   GRanges with 0 ranges and 0 elementMetadata values
         seqnames ranges strand |

   seqlengths
    chr3 chr4 chrM
     300   NA   15

   > countOverlaps(gr1, gr2)
   Warning message:
   In .Seqinfo.mergexy(x, y) :
     Each of the 2 combined objects has sequence levels not in the other:
     - in 'x': chr1, chr2
     - in 'y': chr4
     Make sure to always combine/compare objects based on the same reference
     genome (use suppressWarnings() to suppress this warning).
   Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, 
minoverlap = minoverlap,  :
     error in evaluating the argument 'x' in selecting a method for 
function 'queryHits': Error in mergeNamedAtomicVectors(isCircular(x), 
isCircular(y), what = c("sequence",  :
     sequences chr3, chrM have incompatible circularity flags:
     - in 'x': FALSE, TRUE
     - in 'y': TRUE, FALSE

Hope this clarifies.

H.

 > sessionInfo()
R version 2.13.0 RC (2011-04-05 r55310)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_1.3.38 IRanges_1.9.31


>
> Thanks,
> Michael
>
> On Mon, Apr 11, 2011 at 6:53 AM, Valerie Obenchain<vobencha at fhcrc.org>wrote:
>
>> Hi Lax,
>>
>>
>> On 04/10/11 08:13, Lakshmanan Iyer wrote:
>>
>>> Hello,
>>> I created a GRangesList of 3'UTRs and GappedAlignments of a BAM file using
>>> the following code:
>>> _____________________
>>> library(GenomicRanges);
>>>   library(Rsamtools);
>>>   library(leeBamViews)
>>> library ("GenomicFeatures")
>>> txdb<- makeTranscriptDbFromUCSC(genome="mm9", tablename="knownGene")
>>> #
>>> # It remembers the transcript ids. so can be used later
>>> p3UTRs<- threeUTRsByTranscript(txdb, use.names=TRUE)
>>> # read the bam alignment
>>> aligns<- readGappedAlignments ("Dend_accepted_hits_sorted.bam")
>>>
>>> --------------------------------------
>>> However, when I try to get the countOverlaps, I get the following error:
>>>
>>>
>>>> c<- countOverlaps (p3UTRs, aligns)
>>>>
>>>>
>>> Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, type =
>>> type)) :
>>>    error in evaluating the argument 'x' in selecting a method for function
>>> 'queryHits'
>>>
>>> I suspect that this happens as the sequence names, "chrUn_random
>>> chrX_random
>>> chrY_random" etc which are present in the GRangeslist, p3UTRs are not
>>> present in the GappedAlignement objects.
>>>
>>> My questions is:
>>> 1. Can I make countOverlaps to choose only those intervals in which there
>>> are entries in both the objects?
>>>
>>>
>> No.  The general approach is to clean the subject and query before calling
>> countOverlaps.
>>
>>    2. How do I subjset GRangeList (p3UTRs) to include entries from specific
>>> list of chromosomes of interest only?
>>>
>>>
>> To retain specific chromosomes, you can subset with seqnames then reset the
>> seqlevels.
>>
>> In R-2.13/BioC 2.8, using the grl object from the GRangesList man page :
>>
>>> grl
>> GRangesList of length 3
>> $gr1
>> GRanges with 1 range and 2 elementMetadata values
>>     seqnames    ranges strand |     score        GC
>> <Rle>  <IRanges>  <Rle>  |<integer>  <numeric>
>> [1]     chr2    [3, 6]      + |         5      0.45
>>
>> $gr2
>> GRanges with 2 ranges and 2 elementMetadata values
>>     seqnames    ranges strand |     score        GC
>> <Rle>  <IRanges>  <Rle>  |<integer>  <numeric>
>> [1]     chr1  [ 7,  9]      + |         3       0.3
>> [2]     chr1  [13, 15]      - |         4       0.5
>>
>> $gr3
>> GRanges with 2 ranges and 2 elementMetadata values
>>     seqnames    ranges strand |     score        GC
>> <Rle>  <IRanges>  <Rle>  |<integer>  <numeric>
>> [1]     chr1    [1, 3]      - |         6       0.4
>> [2]     chr2    [4, 9]      - |         2       0.1
>>
>>
>> seqlengths
>>   chr2 chr1
>>    NA   NA
>>
>>     grlsub<- grl[seqnames(grl) == "chr1", ]
>>     seqlevels(grlsub)<- seqlevels(grlsub)[seqlevels(grlsub) == "chr1"]
>>
>> You can remove the empty list elements with,
>>
>>     grlsub<- grlsub[elementLengths(grlsub)>0]
>>
>>
>>> grlsub
>> GRangesList of length 2
>> $gr2
>> GRanges with 2 ranges and 2 elementMetadata values
>>     seqnames    ranges strand |     score        GC
>> <Rle>  <IRanges>  <Rle>  |<integer>  <numeric>
>> [1]     chr1  [ 7,  9]      + |         3       0.3
>> [2]     chr1  [13, 15]      - |         4       0.5
>>
>> $gr3
>> GRanges with 1 range and 2 elementMetadata values
>>     seqnames    ranges strand |     score        GC
>> <Rle>  <IRanges>  <Rle>  |<integer>  <numeric>
>> [1]     chr1    [1, 3]      - |         6       0.4
>>
>>
>> seqlengths
>>   chr1
>>    NA
>>
>>
>> Valerie
>>
>>
>>> sessionInfo()
>> R version 2.13.0 beta (2011-03-30 r55199)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.3.37 IRanges_1.9.29
>>
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.13.0
>>
>>
>>
>>
>>
>>   -best
>>> -Lax
>>>
>>>
>>>> sessionInfo()
>>>>
>>>>
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] GenomicFeatures_1.2.1 leeBamViews_0.99.11   BSgenome_1.18.2
>>> [4] Biobase_2.10.0        Rsamtools_1.2.0       Biostrings_2.18.0
>>> [7] GenomicRanges_1.2.3   IRanges_1.8.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] biomaRt_2.6.0      DBI_0.2-5          RCurl_1.4-3
>>> RSQLite_0.9-2
>>> [5] rtracklayer_1.10.5 tools_2.12.0       XML_3.2-0
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list