[R] Very Slow Gower Similarity Function

Anon. bob.ohara at helsinki.fi
Mon Apr 18 19:36:56 CEST 2005


Jari Oksanen wrote:

>
> On 18 Apr 2005, at 19:10, Tyler Smith wrote:
>
>> Hello,
>>
>> I am a relatively new user of R. I have written a basic function to 
>> calculate
>> the Gower similarity function. I was motivated to do so partly as an 
>> excercise
>> in learning R, and partly because the existing option (vegdist in the 
>> vegan
>> package) does not accept missing values.
>>
> Speed is the reason to use C instead of R. It should be easy, almost 
> trivial, to modify the vegdist.c  so that it handles missing values. I 
> guess this handling means ignoring the value pair if one of the values 
> is missing -- which is not so gentle to the metric properties so dear 
> to Gower. Package vegan is designed for ecological community data 
> which generally do not have missing values (except in environmental 
> data), but contributions are welcome.
>
The only reason you never see ecological community data with missing 
values is because the ecologists remove those species/sites from their 
Excel sheets before they give it to you to sort out their mess.  This is 
actually one of the few things they know how to do in Excel - I'm 
dreading the day when a paper appears in JAE saying that you can use 
Excel to produce P-values.

To be slightly more serious, as an exercise the OP could consider 
writing a wrapper function in R that removes the missing data and then 
calls vegdist to calculate his Gower similarity index.

Bob

-- 
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax:  +358-9-191 51400
WWW:  http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org




More information about the R-help mailing list