[BioC] coverage() in IRanges gives >0 coverage when weight == 0
Ludo Pagie
ludo.pagie at gmail.com
Mon Sep 2 16:34:33 CEST 2013
Hi all,
I don't understand what's going on here.I'm trying to compute the
coverage of a few ranges. I specify a weight for these ranges and for
some positions I get a coverage even though I know these positions are
only covered by ranges which have 0 weight.
Worse, the value I do get for these positions depends on the number of
ranges I feed into the function. Here's some code using extracts of
the data I'm working on:
################################################################
################################################################
# funny way to make the same RangedData object I have
ranges <- as(structure(list(space = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "CH321-10J13_YELLOW", class = "factor"), start = c(13449753L,
13449972L, 13449972L, 13450037L, 13450038L, 13450039L, 13450039L,
13450042L, 13450169L, 13450193L, 13450198L, 13450253L, 13450284L,
13450296L, 13450308L, 13450308L, 13450522L, 13450690L, 13450690L,
13450870L, 13450997L, 13451104L, 13451133L, 13451133L, 13451142L,
13451147L, 13451215L, 13451216L, 13451513L, 13451788L, 13451812L,
13451812L, 13451902L, 13451961L, 13451961L, 13452337L, 13452955L,
13453049L, 13453053L), end = c(13451753L, 13451573L, 13451573L,
13451908L, 13451921L, 13451700L, 13451700L, 13451675L, 13452208L,
13451895L, 13451971L, 13451731L, 13451553L, 13452172L, 13451502L,
13451502L, 13451806L, 13452630L, 13452630L, 13452711L, 13452527L,
13452571L, 13451378L, 13451378L, 13452533L, 13453030L, 13452281L,
13452439L, 13453079L, 13453360L, 13452603L, 13452603L, 13453506L,
13452789L, 13452789L, 13453123L, 13453484L, 13453773L, 13453387L
), width = c(2001L, 1602L, 1602L, 1872L, 1884L, 1662L, 1662L,
1634L, 2040L, 1703L, 1774L, 1479L, 1270L, 1877L, 1195L, 1195L,
1285L, 1941L, 1941L, 1842L, 1531L, 1468L, 246L, 246L, 1392L,
1884L, 1067L, 1224L, 1567L, 1573L, 792L, 792L, 1605L, 829L, 829L,
787L, 530L, 725L, 335L), count = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("space", "start", "end",
"width", "count"), row.names = c(NA, -39L), class = "data.frame"),'RangedData')
# make the weight vector
count <- c(3.790678721025, 3.790678721025, 0, 0, 0, 1.263559573675,
2.52711914735,
6.317797868375, 159.20850628305, 0, 156.6813871357, 1.263559573675,
162.999185004075, 2.52711914735, 1139.73073545485, 0, 0, 0, 304.517857255675,
5.0542382947, 1.263559573675, 3.790678721025, 1.263559573675,
0, 6.317797868375, 32.85254891555, 0, 451.090767801975, 0, 17.68983403145,
24.007631899825, 0, 46.751704225975, 0, 0, 0, 0, 0, 0)
# my shift argument
shift <- structure(list(`CH321-10J13_YELLOW` = -13421338), .Names =
"CH321-10J13_YELLOW")
# I know base position 13453507 is covered by a single range, which has weight 0
mypos <- 13453507
# which ranges overlap with mypos:
which(start(ranges) <= mypos & end(ranges) >= mypos)
# 38
# this range has weight 0
count[38]
# 0
# compute coverage and report the coverage at mypos:
coverage(ranges, weight=list(count))[[1]][mypos]
#numeric-Rle of length 1 with 1 run
# Lengths: 1
# Values : -5.61328761250479e-13
# if I reduce the number of ranges going into the computation I get
different results
coverage(ranges[20:39,], weight=list(count[20:39]))[[1]][mypos]
#numeric-Rle of length 1 with 1 run
# Lengths: 1
# Values : 7.105427357601e-15
coverage(ranges[35:39,], weight=list(count[35:39]))[[1]][mypos]
#numeric-Rle of length 1 with 1 run
# Lengths: 1
# Values : 0
# sessionInfo:
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] IRanges_1.18.3 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1
################################################################
################################################################
I get the same issue in another R-version:
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] IRanges_1.16.6 BiocGenerics_0.4.0
[3] BiocInstaller_1.8.3 gdata_2.13.2
[5] gtools_3.0.0
loaded via a namespace (and not attached):
[1] KernSmooth_2.23-10 parallel_2.15.1 stats4_2.15.1
[4] tools_2.15.1
Is this the result of precision issues? Or is it a bug? Or am I
overlooking something else?
Thanks for your help, Ludo
More information about the Bioconductor
mailing list