[R] distance coefficient for amatrix with ngative valus

R. Michael Weylandt michael.weylandt at gmail.com
Tue Oct 4 06:05:19 CEST 2011


Comments inline:

On Mon, Oct 3, 2011 at 11:27 PM, dilshan benaragama
<benaragamad at yahoo.com> wrote:
> Yes I think you did not get my problem.

No, you did not state your problem. I have replied to everything you
have actually included to this point. Admittedly, I have failed to
reply to things you did not say...

>  Actualy I want run PCO with
> (labdsv). To do that I I am trying to get the distance metrix using
> following fuctions with library (vegan).

This is now the 7th email in this chain. You should mention the
packages and functions you are using in the FIRST email of the chain.
This is mentioned in the posting guide which you apparently have still
not yet read.

>
> pca.gower<- vegdist(envt[,2:9],method="gower")
> pca.eucl<-vegdist(envt[,2:9],method="euclidean")
> pca.chi<-vegdist(envt[,2:9],method="chi.square")
> pca.mahal<-vegdist(envt[,2:9],method="mahal")
> pca.bray<-vegdist(envt,method="bray")
>
> However none of the functions work

They all work for any data I put in. This is perhaps when that minimal
working example, which you also should have included, is necessary.
The append at the end of each of the 7 emails in this chain that tells
you to read the posting guide also asks for this, as did I explicitly.

> (gives an error saying that is not
> working due to negatve values)

No, they each give warnings. Warnings are not errors. They are
warnings and they say "warning". Perhaps unsurprisingly, errors say
"error". If you are using an old version of vegan that throws an
error, you should always update before seeking help.Not surprisingly,
a certain document suggests this.

> except euclidean distance for the raw data
> set as the raw data has negative values for some variables. It is no point
> of using euclidean metrix with PCO as we can do the same thing from PCA. So
> I need to find a way I can run PCO with a different dissimilarity metrix
> for this data. It will be a great help if you can help me on this

Actually read the warning message: it warns you that you have given
negative data to an ecological function and suggests this might be a
point you look into as this usually suggests a user-end problem. It
does not fail to work in any sense of the word as evidence by the
output of distances. If  negative data is nonsense, you should heed
this warning; if you know its not, disregard it.

More importantly, as I said in my initial response, any distance
metric worth its salt is translation invariant. To wit,

x <- matrix(rnorm(50),5)

d1 = vegdist(x, method="gower")
d2 = vegdist(x + abs(min(x))*3, method="gower")

all.equal(as.numeric(d1), as.numeric(d2))
TRUE

In fairness, I'll admit this does not seem to work for the bray
distance. I am not an ecologist and I do not know why this would be --
it does leave me somewhat confused as to what sort of space motivates
the bray metric, but that's a discussion for another time and place --
but the function still returns a valid dist object for both d1 and d2.

>
> Thanks,
> From: R. Michael Weylandt <michael.weylandt at gmail.com>
> To: dilshan benaragama <benaragamad at yahoo.com>; r-help
> <r-help at r-project.org>

You will note that I include the r-help list on each email on this
chain while you have not; this is mentioned in the posting guide.

> Sent: Monday, October 3, 2011 10:00:53 PM
> Subject: Re: [R] distance coefficient for amatrix with ngative valus
>
> You still haven't explained what's wrong with *almost every metric
> there is*, but if you want other distance metrics have you considered
> those in the package you are using, via the function dsvdis().
> Consider, for example:
>
> library(labdsv)
>
> X <- get(data(bryceveg));
>
> X[, sample(NROW(X))] <- (-1)*X[, sample(NROW(X))] # Put some negative
> values in all willy nilly like....
> Y <- pco( dsvdis(X, index="bray/curtis") )
> print(any(X < 0))
>
> If you want more explanation, please provide actual details of what
> you are asking, as requested in my first email.
>
> Michael Weylandt
>
> On Mon, Oct 3, 2011 at 9:23 PM, dilshan benaragama
> <benaragamad at yahoo.com> wrote:
>> I am using (labdsv). If I can use euclidean distance I can do it with PCA
>> instead of PCO, so I am trying an alternative to PCA, but I cannot find a
>> disimilarity coefficient for that.
>>
>> From: R. Michael Weylandt <michael.weylandt at gmail.com>
>> To: dilshan benaragama <benaragamad at yahoo.com>; r-help
>> <r-help at r-project.org>
>> Sent: Monday, October 3, 2011 3:27:53 PM
>> Subject: Re: [R] distance coefficient for amatrix with ngative valus
>>
>> One order of the usual coming right up!
>>
>> 1 course of "Why does XXX not work for you?" a la francaise, where XXX
>> is, in your case, the Euclidean distance.  Specifically, any metric
>> worth its salt (in a normed space) satisfies dist(a,b) = dist(a+c,b+c)
>> so why are negative values a problem?...
>>
>> 2 sides: a "Minimal Working Example" with a light buttery sauce and a
>> fried "what package/code are you using"
>>
>> and, for desert, a Winsemian special of: "read the posting guide!"
>>
>> Michael Weylandt, who is putting together a menu for a fancy dinner
>> even as he types
>>
>> On Mon, Oct 3, 2011 at 12:55 PM, dilshan benaragama
>> <benaragamad at yahoo.com> wrote:
>>> Hi,
>>> I need to run a PCoA (PCO) for a data set wich has both positive and
>>> negative values for variables. I  could not find any distancecoefficient
>>> other than euclidean distace running for the data set. Are there any
>>> other
>>> coefficient works with negtive values.Also I cannot get summary out put
>>> (the
>>> eigen values) for PCO as for PCA.
>>>
>>> Thanks.
>>> Dilshan
>>>        [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>
>
>

Would you care to elaborate further as to what the actual problem
entails, with a minimal working example?

More generally, might I suggest you learn how these metrics work and
then apply the most appropriate one rather than groping blindly after
something solely on the criterion of it being non-Euclidean. If you
need other metrics, look into the various p-norms, all of which are
implemented directly in R by way of the dist() function as are a few
other norms with which I am not immediately familiar.

Regards,

Michael Weylandt



More information about the R-help mailing list