[BioC] ctc::hc2Newick branch length errors?

Narayanan, Manikandan (NIH/NIAID) [E] manikandan.narayanan at nih.gov
Thu Mar 14 20:50:18 CET 2013


Hi,
  I may have stumbled upon an error in the branch length calculation of the ctc package's hc2Newick function, which converts an hclust object to Newick format. However I am not confident yet to report it as a bug and hence this email to see if you agree with my interpretations below.

> hc = hclust(d = as.dist(matrix(c(0, 2, 4, 4, 2, 0, 4, 4, 4, 4, 0, 2, 4, 4, 2, 0), 4, 4, dimnames = list(LETTERS[1:4], LETTERS[1:4]))))
> hc2Newick(hc)
[1] "((A:1,B:1):0,(C:1,D:1):0);"

Shouldn't the correct output be [1] "((A:1,B:1):1,(C:1,D:1):1);" so that the branch lengths and hclust heights agree??


Some more information on hclust heights (easiest to view via plot(hc)):
> hc$labels
[1] "A" "B" "C" "D"
> hc$height
[1] 2 2 4
> hc$merge
     [,1] [,2]
[1,]   -1   -2
[2,]   -3   -4
[3,]    1    2
If my interpretations are correct, then the hc2Newick() function code can be fixed by removing all "dist" calculation statements and putting a single "dist" calculation statement like so:
         j <- hc$merge[i, 1] #original code
         k <- hc$merge[i, 2] #original code
+      dist <- hc$height[i] - (ifelse(j < 0, 0, hc$height[j]) + ifelse(k < 0, 0, hc$height[k]))/2 #new line

I'm using R 2.15 and packages: "[1] ctc_1.32.0         amap_0.8-7         Biobase_2.18.0     BiocGenerics_0.4.0".


Thanks,
Mani


PS: Apologies if this email was posted twice, as I wasn't sure if I posted correctly the first time I tried.



More information about the Bioconductor mailing list