[BioC] GRanges, strange behavior when chromosomes don't match
Patrick Aboyoun
paboyoun at fhcrc.org
Fri Jul 9 22:57:34 CEST 2010
Cei,
I checked in a patch to BioC 2.6 (GenomicRanges 1.0.6) and BioC 2.7
(GenomicRanges 1.1.16) that fixes the issue you found in the intersect
and setdiff methods for GRanges objects when the two objects don't have
the same sequence names. These new packages will be available on
bioconductor.org within the next 36 hours.
The intersect and setdiff method for GRanges now form the union of
sequence names for both objects before performing their operations. The
output for the examples you sent are now
> library(GenomicRanges)
> gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"),
+ ranges = IRanges(1:4, width = 10),
+ strand = c("-", "+", "+", "+"))
> gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"),
+ ranges = IRanges(1:4, width = 10),
+ strand = c("-", "+", "+", "+"))
> intersect(gr1,gr2)
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr1 [1, 10] - |
[2] chr2 [2, 11] + |
seqlengths
chr1 chr2 chr3 chr4 chr5 chr6
NA NA NA NA NA NA
> setdiff(gr1,gr2)
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr3 [3, 12] + |
[2] chr4 [4, 13] + |
seqlengths
chr1 chr2 chr3 chr4 chr5 chr6
NA NA NA NA NA NA
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-07-01 r52425)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_1.1.16 IRanges_1.7.10
Cheers,
Patrick
On 7/8/10 4:55 PM, Patrick Aboyoun wrote:
> Cei,
> Thanks for the bug report. I am looking into the issue now and will
> have a patch out shortly.
>
>
> Cheers,
> Patrick
>
>
> On 7/8/10 9:22 AM, Cei Abreu-Goodger wrote:
>> Hello all,
>>
>> When you have two GRanges objects which don't have the same seqnames
>> factor, you can get very unexpected behavior.
>>
>> Would it be possible to get an error/warning when this is attempted?
>> I don't know how many functions are affected, perhaps the functions
>> mentioned below can be modified to produce a more obvious result...
>>
>> Many thanks,
>>
>> Cei
>>
>>
>> Code example / sessionInfo:
>>
>> library(GenomicRanges)
>> gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"),
>> ranges = IRanges(1:4, width = 10),
>> strand = c("-", "+", "+", "+"))
>> gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"),
>> ranges = IRanges(1:4, width = 10),
>> strand = c("-", "+", "+", "+"))
>>
>> union(gr1,gr2)
>> # gives expected result, all 6 ranges
>>
>> intersect(gr1,gr2)
>> # includes ranges from the missing gr2 chromosomes
>>
>> setdiff(gr1,gr2)
>> # includes 3 copies of the ranges from the missing gr2 chromosomes
>>
>>
>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices datasets utils methods base
>>
>> other attached packages:
>> [1] GenomicRanges_1.0.5 IRanges_1.6.8 Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.11.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list