[BioC] GRanges, strange behavior when chromosomes don't match

Patrick Aboyoun paboyoun at fhcrc.org
Fri Jul 9 22:57:34 CEST 2010


Cei,
I checked in a patch to BioC 2.6 (GenomicRanges 1.0.6) and BioC 2.7 
(GenomicRanges 1.1.16) that fixes the issue you found in the intersect 
and setdiff methods for GRanges objects when the two objects don't have 
the same sequence names. These new packages will be available on 
bioconductor.org within the next 36 hours.

The intersect and setdiff method for GRanges now form the union of 
sequence names for both objects before performing their operations. The 
output for the examples you sent are now

 > library(GenomicRanges)
 > gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"),
+           ranges = IRanges(1:4, width = 10),
+           strand = c("-", "+", "+", "+"))
 > gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"),
+           ranges = IRanges(1:4, width = 10),
+           strand = c("-", "+", "+", "+"))

 > intersect(gr1,gr2)
GRanges with 2 ranges and 0 elementMetadata values
     seqnames    ranges strand |
<Rle> <IRanges> <Rle> |
[1]     chr1   [1, 10]      - |
[2]     chr2   [2, 11]      + |

seqlengths
  chr1 chr2 chr3 chr4 chr5 chr6
    NA   NA   NA   NA   NA   NA

 > setdiff(gr1,gr2)
GRanges with 2 ranges and 0 elementMetadata values
     seqnames    ranges strand |
<Rle> <IRanges> <Rle> |
[1]     chr3   [3, 12]      + |
[2]     chr4   [4, 13]      + |

seqlengths
  chr1 chr2 chr3 chr4 chr5 chr6
    NA   NA   NA   NA   NA   NA

 > sessionInfo()
R version 2.12.0 Under development (unstable) (2010-07-01 r52425)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_1.1.16 IRanges_1.7.10


Cheers,
Patrick



On 7/8/10 4:55 PM, Patrick Aboyoun wrote:
> Cei,
> Thanks for the bug report. I am looking into the issue now and will 
> have a patch out shortly.
>
>
> Cheers,
> Patrick
>
>
> On 7/8/10 9:22 AM, Cei Abreu-Goodger wrote:
>> Hello all,
>>
>> When you have two GRanges objects which don't have the same seqnames 
>> factor, you can get very unexpected behavior.
>>
>> Would it be possible to get an error/warning when this is attempted? 
>> I don't know how many functions are affected, perhaps the functions 
>> mentioned below can be modified to produce a more obvious result...
>>
>> Many thanks,
>>
>> Cei
>>
>>
>> Code example / sessionInfo:
>>
>> library(GenomicRanges)
>> gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"),
>>           ranges = IRanges(1:4, width = 10),
>>           strand = c("-", "+", "+", "+"))
>> gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"),
>>           ranges = IRanges(1:4, width = 10),
>>           strand = c("-", "+", "+", "+"))
>>
>> union(gr1,gr2)
>> # gives expected result, all 6 ranges
>>
>> intersect(gr1,gr2)
>> # includes ranges from the missing gr2 chromosomes
>>
>> setdiff(gr1,gr2)
>> # includes 3 copies of the ranges from the missing gr2 chromosomes
>>
>>
>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.0.5 IRanges_1.6.8       Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.11.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list