[R] Very Slow Gower Similarity Function

Mon Apr 18 22:23:45 CEST 2005

Quoting Martin Maechler <maechler at stat.math.ethz.ch>:

> I don't know what exactly you want.

The Gower coefficient I am referring to comes from his 1971 article in
Biometrics (27(4):857-871). It differs from most commonly used measures (but
not, apparently, daisy!) by allowing the incorporation of quantitative and
qualitative (binary or unordered multistate characters) variables, and also by
providing a mechanism for dropping missing values from similarity calculations.
This is also covered in Legendre and Legendre.

>
> The function  daisy() in the recommended package "cluster"
> has always worked with missing values and IIRC, the book
> "Kaufman & Rousseeuw" {which I have not at hand here at home},
> clearly mentions Gower's origin of their distance measure
> definition.

I was unaware of the daisy function. Looking over it now it differs from the
Gower coefficient primarily in the method of standardization. Gower
standardized each variable by dividing it by it's range ("ranging"), where
daisy does a more conventional standardization (-mean and /SD). As I understand
it, there isn't much to recommend standardizing over ranging (or vice versa) so
daisy may provide a useful alternative for my project. I'll have to look into
it!

Thanks,

Tyler

>
> Martin Maechler, maintainer of cluster package,
> ETH Zurich
>