[BioC] ctc::hc2Newick branch length errors?
Narayanan, Manikandan (NIH/NIAID) [E]
manikandan.narayanan at nih.gov
Thu Mar 14 20:50:18 CET 2013
Hi,
I may have stumbled upon an error in the branch length calculation of the ctc package's hc2Newick function, which converts an hclust object to Newick format. However I am not confident yet to report it as a bug and hence this email to see if you agree with my interpretations below.
> hc = hclust(d = as.dist(matrix(c(0, 2, 4, 4, 2, 0, 4, 4, 4, 4, 0, 2, 4, 4, 2, 0), 4, 4, dimnames = list(LETTERS[1:4], LETTERS[1:4]))))
> hc2Newick(hc)
[1] "((A:1,B:1):0,(C:1,D:1):0);"
Shouldn't the correct output be [1] "((A:1,B:1):1,(C:1,D:1):1);" so that the branch lengths and hclust heights agree??
Some more information on hclust heights (easiest to view via plot(hc)):
> hc$labels
[1] "A" "B" "C" "D"
> hc$height
[1] 2 2 4
> hc$merge
[,1] [,2]
[1,] -1 -2
[2,] -3 -4
[3,] 1 2
If my interpretations are correct, then the hc2Newick() function code can be fixed by removing all "dist" calculation statements and putting a single "dist" calculation statement like so:
j <- hc$merge[i, 1] #original code
k <- hc$merge[i, 2] #original code
+ dist <- hc$height[i] - (ifelse(j < 0, 0, hc$height[j]) + ifelse(k < 0, 0, hc$height[k]))/2 #new line
I'm using R 2.15 and packages: "[1] ctc_1.32.0 amap_0.8-7 Biobase_2.18.0 BiocGenerics_0.4.0".
Thanks,
Mani
PS: Apologies if this email was posted twice, as I wasn't sure if I posted correctly the first time I tried.
More information about the Bioconductor
mailing list