Frédéric Chiroleu
frederic.chiroleu at cirad.fr
Tue Oct 16 09:47:51 CEST 2007
Hi,
I misunderstand the definition of Canberra distance in R.
On Internet and in function description pages of dist() from stats and
Dist() from amap, Canberra distance between vectors x and y, d(x,y), is :
d(x,y) = sum(abs(x-y)/(x+y))
But in use, through simple examples, we find that the formula is :
d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y))
with NZ = nb of pairs of coordinates that are different from (0,0) (Non
Zeros)
Functions vegdist() from vegan and gdist() from mvpart, like
documentation of ADE4 software, use (for positive variables) :
d(x,y) = 1/NZ * sum(abs(x-y)/(x+y))
Can someone help me to understand the differences in the choice of the
formula and why there's a difference between calculus and explaination
for dist() ?
Thank you for your help.
Best regards,
Fred
PS : Be careful with function dudi.pca() from ade4 ; in values, "norm"
doesn't give you what is written in the help page : "norm" returns the
vector of standard deviations of initial variables when you choose
"normed" PCA and the vector of standard deviations of normed variables,
ie 1, when you choose non "normed" PCA. We contacted authors of the
package unsuccessly to rectify the information.
