[R] Ward's Clustering Doubts

Mark Difford mark_difford at yahoo.co.uk
Mon Sep 15 16:38:59 CEST 2008


Hi Rodrigo,

Glad it helped. You might find the following useful:

@ARTICLE{McArdle2001,
  author = {McArdle, Brian H. and Anderson, Marti J.},
  title = {Fitting multivariate models to community data: a comment on
distance-based
	redundancy analysis},
  journal = {Ecology},
  year = {2001},
  volume = {82},
  pages = {290--297},
  number = {1},
  month = {Jan},

Regards, Mark.



Rodrigo Aluizio wrote:
> 
> Well, once again, thank you so much Mark.
> My original Ward's cluster, not transformed (which one wasn't euclidean)
> is
> simply identical to the one "euclidefied" with Lingoes function (ape4
> package).
> 
> Regards, Rodrigo.
> 
> --------------------------------------------------
> From: "Mark Difford" <mark_difford at yahoo.co.uk>
> Sent: Monday, September 15, 2008 8:09 AM
> To: <r-help at r-project.org>
> Subject: Re: [R] Ward's Clustering Doubts
> 
>>
>> Hi Rodrigo,
>>
>> [apropos of Ward's method]
>>
>>>> ... we saw something like "You must use it with Euclidean Distance..."
>>
>> Strictly speaking this is probably correct, as Ward's method does an
>> analysis of variance type of decomposition and so doesn't really make
>> much
>> sense  (I think) unless Euclidean distance (i.e. least-squares) is used.
>>
>> However, there may be ways around this. First, because a distance metric
>> is
>> non-Euclidean does not mean that it is always non-Euclidean. You can test
>> this using ?is.euclid in package ade4. You can also test your matrix by
>> doing a principal co-ordinate analysis; then look for negative
>> eigenvalues.
>> If none are found, the matrix is Euclidean and it should be OK to use
>> Ward's
>> method on that data set.
>>
>> Probably a better approach is to make your distance matrix Euclidean.
>> There
>> are several functions in ade4 that will do this. The idea then is to
>> present/compare the two solutions: the first using the uncorrected,
>> non-Euclidean distance matrix, the second using the corrected version.
>> You
>> could use procrustes/co-inertia analysis to compare the two in an
>> intermediate step.
>>
>> Regards, Mark.
>>
>>
>> Rodrigo Aluizio wrote:
>>>
>>> Hi Everybody,
>>> Now I have a doubt that is more statistical than R's technical. I’m
>>> working with ecology of recent Foraminifera.
>>>
>>> At the lab we used to perform cluster analysis using 1-Pearson’s R and
>>> Wards method (we already saw it in bibliography of the area) which
>>> renders
>>> good results with our biological data. Recently, using “R” Software
>>> (vegan
>>> and Cluster packages) which allows the combination of any kind of
>>> distances matrix with any clustering method, we tried to used Bray
>>> Curtis
>>> + Wards (which seem to be more appropriate to a matrix with a lot of
>>> zeros) and it renders a better result. Furthermore, the results agree
>>> with
>>> our hypothesis and with the results we have got with the Distance-based
>>> Redundancy Analysis - dbRDA or CAP. It means, the analysis (Q-mode)
>>> clusters the stations according to the main physical, sedimentary and
>>> biological characteristics of the study area.
>>>
>>> We received some critical comments noticing that Wards Method accepts
>>> Euclidean Distance only. So, we made the analysis again using Euclidean
>>> Distance but we don’t get the better results we had using 1-Pearson’s R
>>> +
>>> Wards or Bray Curtis + Wards (actually any other distance + method
>>> combination rendered better results). Trying to find answers in the
>>> specialized literature we just got little more confused because in any
>>> moment we saw something like "You must use it with Euclidean Distance"
>>> and
>>> like I said above we already saw in some articles from respected
>>> journals,
>>> other kind of distance associated with the Ward's Clustering method.
>>>
>>> Is it wrong or is it “non sense” to do the analysis in the way we were
>>> doing?
>>>
>>> The results with Wards combined with 1-Pearson’s R or Bray Curtis fit
>>> better with our hypothesis and have excellent agglomerative coefficients
>>> ,
>>> but we don’t want to make inappropriate statistical procedures. I'm
>>> starting to realize how powerful R is, but it doesn't justify doing
> [[elided Yahoo spam]]
>>>
>>> Thank you in advance.
>>>
>>> Rodrigo.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Ward%27s-Clustering-Doubts-tp19486028p19490991.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Ward%27s-Clustering-Doubts-tp19486028p19494336.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list