[BioC] undocumented behavior in tilingArray::costMatrix [Scanned]
Du, Yang
duyang at fbn-dummerstorf.de
Thu Jul 26 12:25:38 CEST 2012
Dear Wolfgang,
I am looking at one particular function costMatrix in tilingArray, it
seems that the returning matrix is not consistent with the
documentation.
The calculation is fine for rows G[1:(maxk-1),]; but for the last row
G[maxk,] only the first column has been assigned a value outside the
loop, thus I am not sure if this is intended (as I feel this
would eliminate certain cases in the segmentation result).
Below is a small example, far from real but just a proof of concept,
1> x<-c(1,1,1,3,3,3)
## here the behavior is expected since only column 1 can be filled, from
column 2 has (2+6-1)>6 thus kept NA
> costMatrix(x, maxk=6)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0 0.000000 0.000000 0 0 0
[2,] 0.0 0.000000 2.000000 0 0 NA
[3,] 0.0 2.666667 2.666667 0 NA NA
[4,] 3.0 4.000000 3.000000 NA NA NA
[5,] 4.8 4.800000 NA NA NA NA
[6,] 6.0 NA NA NA NA NA
# here I was expecting the first 2 rows in the previous call,
1> costMatrix(x, maxk=2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 NA NA NA NA NA
So the returning matrix will be ok when maxk is equal to the length/rows
of the input vector/matrix, but only having the first column filled in
the last row when maxk is less.
Subsequently this will render inaccurate estimates when using the wrong matrix,
since in the c code all elements with wrong NA will not be add to the cost function.
# only looking for one change, 2 segments, biased towards n-1
> .Call("findsegments", costMatrix(x,maxk=3) ,as.integer(2), as.integer(1), PACKAGE="tilingArray")$th
[,1] [,2]
[1,] 7 0
[2,] 6 7
> segment(x,maxk=3,2)@breakpoints[[2]]
estimate
[1,] 6
# here give a larger value for maxk then subset the first maxk-1 rows, which are accurate.
> .Call("findsegments", costMatrix(x,maxk=4)[1:3,] ,as.integer(2), as.integer(1), PACKAGE="tilingArray")$th
[,1] [,2]
[1,] 7 0
[2,] 4 7
--
Regards
Yang Du
More information about the Bioconductor
mailing list