[R-sig-eco] Distance matrix based on correlation coefficient

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Jul 21 15:59:17 CEST 2010


On Wed, 2010-07-21 at 05:38 -0700, Dragos Zaharescu wrote:
> Thanks Gavin, Jari,

It is important to read the help pages carefully! I've never used fso()
nor had it installed on my home PC but it took just 30 seconds to see
what the problem was by reading the help for fso()

>From ?fso
....
Arguments:

 formula: a formula in the form of ~x+y+z (no LHS)

     dis: a dist object such as that returned by ‘dist’, ‘dsvdis’, or
          ‘vegdist’

    data: a data frame that holds variables listed in the formula
....

You are supplying a /matrix/ as argument 'dis' and this is *not* what is
returned by any of the functions listed in the argument summary for
'dis'. I showed you this with the last line of my initial response:

dis <- as.dist(sqrt(2-2 * cor(MET, method="spearman"))

## or, if you want 1 - dis, and reusing your code, do

dis <- as.dist(1 - sim)

will do what you want (the former following Jari's canonical
transformation of correlation to distance).

See further below...

> Trying to get the message across more clear, this are the commands I used to 
> calculate the distance matrix for fso function. It seems there is a problem with 
> the dissimilarity matrix attachment; maybe it is not in the right form...
> Any suggestion appreciated.

<snip />

> #dissimilarity
> > sim<-cor(MET,method="spearman")
> > dis<-1-sim
> > dis
>           Al        Li        Ti        Mn         Cu        As         Pb
> Al 0.0000000 0.2372233 0.5757272 0.4735963 0.26892862 0.3438074 0.37561628
> Li 0.2372233 0.0000000 0.5760003 0.4638951 0.31543042 0.3197032 0.33668230
> Ti 0.5757272 0.5760003 0.0000000 0.4222668 0.25263333 0.3481023 0.30435507
> Mn 0.4735963 0.4638951 0.4222668 0.0000000 0.26073875 0.2570617 0.25378123
> Cu 0.2689286 0.3154304 0.2526333 0.2607387 0.00000000 0.2178201 0.08399498
> As 0.3438074 0.3197032 0.3481023 0.2570617 0.21782013 0.0000000 0.26803176
> Pb 0.3756163 0.3366823 0.3043551 0.2537812 0.08399498 0.2680318 0.00000000

Notice that this is a /matrix/. fso wants a 'dist' object. It could be a
bit clearer in the error message. Perhaps email Dave Roberts and suggest
a rewording for the error?

> > attach(CC)
> 
>         The following object(s) are masked from CC ( position 3 ) :
> 
>          PrecipitDays SnowDays SpringFreezLevel 
> 
> 
>         The following object(s) are masked from CC ( position 8 ) :
> 
>          PrecipitDays SnowDays SpringFreezLevel 

Hmm, you should rarely ever need to attach - I'm guessing but the above
warnings probably arise from multiple copies of CC on your search path
due to repeated attach()ing? This can be a recipe for disaster...

> #fuzzy set ordination
> > z <- fso(~SpringFreezLevel+ PrecipitDays+SnowDays,dis,permute=1000)

... and totally not needed. In the formula method this is what argument
'data' is for.

z <- fso(~ SpringFreezLevel + PrecipitDays + SnowDays, data = CC,
         dis = dis, permute = 1000)

will fit the model without the attach()ing mess. Learning how to use R's
formula interface is a key skill to get right early on.

Also, when not in formulas, consider looking at function with() so that
instead of 

attach(CC); mean(SnowDays)

or 

mean(CC$SnowDays)

you can do

with(CC, mean(SnowDays))

The latter is very explicit about what you want it to do and you don't
need those horrible (IMHO) $ operators.

> Error in fso.formula(~SpringFreezLevel + PrecipitDays + SnowDays, dis,  : 
>   You must supply a (dis)similarity matrix

HTH

G

> 
> ________________________________
>     
> 
> Message: 2
> Date: Tue, 20 Jul 2010 20:42:54 +0100
> From: Gavin Simpson <gavin.simpson at ucl.ac.uk>
> 
> Cc: r-sig-ecology at r-project.org
> Subject: Re: [R-sig-eco] Distance matrix based on correlation
>     coefficient
> Message-ID: <1279654974.2355.49.camel at desktop.localdomain>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Tue, 2010-07-20 at 12:12 -0700, Dragos Zaharescu wrote:
> > I would much appreciate if someone would enlighten me on how to calculate a 
> > distance matrix based on correlation coefficient (Spearman)? The simple 
> > correlation matrix seems not to work.
> 
> In what sense did it not work? We aren't mind readers! Hence the posting
> guide asking you to provide information that will help us to help you.
> 
> > I am trying to use it in FSO/MFSO to calculate the influence of 
> > climate factors on heavy metals concentrations.
> 
> Does this help at all?
> 
> > dat <- data.frame(A = rnorm(10), B = rnorm(10), C = rnorm(10))
> > cor(dat)
>            A          B         C
> A 1.00000000 0.08986947 0.1224007
> B 0.08986947 1.00000000 0.2667838
> C 0.12240068 0.26678381 1.0000000
> > 1 - cor(dat) ## dissimilarity
>           A         B         C
> A 0.0000000 0.9101305 0.8775993
> B 0.9101305 0.0000000 0.7332162
> C 0.8775993 0.7332162 0.0000000
> > as.dist(1 - cor(dat))
>           A         B
> B 0.9101305          
> C 0.8775993 0.7332162
> 
> HTH
> 
> G
> 
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 20 Jul 2010 23:35:02 +0300
> From: Jari Oksanen <jari.oksanen at oulu.fi>
> To: Gavin Simpson <gavin.simpson at ucl.ac.uk>,    Dragos Zaharescu
> 
> Cc: r-sig-ecology at r-project.org
> Subject: Re: [R-sig-eco] Distance matrix based on correlation
>     coefficient
> Message-ID: <C86BE326.104BB%jari.oksanen at oulu.fi>
> Content-Type: text/plain;    charset="US-ASCII"
> 
> On 20/07/10 22:42 PM, "Gavin Simpson" <gavin.simpson at ucl.ac.uk> wrote:
> 
> > On Tue, 2010-07-20 at 12:12 -0700, Dragos Zaharescu wrote:
> >> I would much appreciate if someone would enlighten me on how to calculate a
> >> distance matrix based on correlation coefficient (Spearman)? The simple
> >> correlation matrix seems not to work.
> > 
> 
> >> 1 - cor(dat) ## dissimilarity
> >
> Actually the canonical transformation to distance is sqrt(2-2*cor(dat)).
> 
> Cheers, Jari Oksanen
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Tue, 20 Jul 2010 21:38:23 +0100
> From: Gavin Simpson <gavin.simpson at ucl.ac.uk>
> To: Jari Oksanen <jari.oksanen at oulu.fi>
> 
>     r-sig-ecology at r-project.org
> Subject: Re: [R-sig-eco] Distance matrix based on correlation
>     coefficient
> Message-ID: <1279658303.18174.2.camel at localhost>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Tue, 2010-07-20 at 23:35 +0300, Jari Oksanen wrote:
> > On 20/07/10 22:42 PM, "Gavin Simpson" <gavin.simpson at ucl.ac.uk> wrote:
> > 
> > > On Tue, 2010-07-20 at 12:12 -0700, Dragos Zaharescu wrote:
> > >> I would much appreciate if someone would enlighten me on how to calculate a
> > >> distance matrix based on correlation coefficient (Spearman)? The simple
> > >> correlation matrix seems not to work.
> > > 
> > 
> > >> 1 - cor(dat) ## dissimilarity
> > >
> > Actually the canonical transformation to distance is sqrt(2-2*cor(dat)).
> > 
> > Cheers, Jari Oksanen
> 
> Of course, thanks Jari. That 1 - bit was total rubbish, not even
> acknowledging that cor could be negative. Not sure what came over me; I
> blame the heat here in London ;-)
> 
> Hangs head in shame.
> 
> G
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list