[BioC] duplicated on IRanges object
Manuela Hummel
manuela.hummel at crg.es
Fri Oct 22 16:44:30 CEST 2010
Hi,
there seems to be a numerical issue when applying 'duplicated' on an IRanges object.
When there are two ranges that are almost the same, and within the IRanges object there are some other ranges with huge width, 'duplicated' identifies the two "almost the same" ranges as "the same".
If we take for example those two ranges:
> ir <- IRanges(start=rep(1000000000, 2), width=200:201)
> ir
IRanges of length 2
start end width
[1] 1000000000 1000000199 200
[2] 1000000000 1000000200 201
They are obviously not the same:
> duplicated(ir)
[1] FALSE FALSE
But when we now add another range with huge width:
> ir2
IRanges of length 3
start end width
[1] 1000000000 1000000199 200
[2] 1000000000 1000000200 201
[3] 5000000 100000000 95000001
... the second range is detected as duplicate of the first:
> duplicated(ir2)
[1] FALSE TRUE FALSE
I guess the problem is that in .toNumericWithCompatibleOrder the variable max_width gets so large, such that
start(x) + width(x)/(max_width+1.00)
gets numerically identical for ranges like the first two in the example.
Best regards
Manuela
Ps: By the way, thanks for the great IRanges package! It makes working with sequence data so much easier.
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] IRanges_1.8.0
Manuela Hummel
Core Facilities - Microarrays Unit
Center for Genomic Regulation (CRG)
Dr. Aiguader 88, 4th flour, Office 439.01
08003 Barcelona
Phone: +34 93 316 0373
e-mail: manuela.hummel at crg.es
More information about the Bioconductor
mailing list