[R-sig-genetics] Pegas vs Arlequin, and negative AMOVA values

Fri May 8 06:08:17 CEST 2020

I think you should go deeper in your data exploration. Here are two other other diagnostics you can do: 

del.colgapsonly(x, freq.only = TRUE) 
del.rowgapsonly(x, freq.only = TRUE) 

These will give you the number of gaps for each column and row, respectively. 

Imagine the following situation: x is an alignment with 100 sequences and 1000 sites, all sequences are complete with no ambiguity, except one which has 500 bp from the 5'-end, so it has a trail of 500 "-" on the 3'-end to be aligned with the 99 others. Doing base.freq(x, all = TRUE) will show that there are 0.5% of gaps so you may think it's OK. But that's wrong. Doing dist.dna(x) will throw 50% of the data (even if you add more complete sequences to the alignment!). If the rates of evolution are different in the two halves of the sequence, then comparing the results from dist.dna(x) and dist.dna(x, pairwise.deletion = TRUE) is likely to be very tricky. 

That's where the two above diagnostics may help you: you may find better to remove some sequences if they have a lot of gaps and they create more trouble than anything else. 

HTH 

Best, 

Emmanuel 

----- Le 8 Mai 20, à 4:07, Marc Domènech Andreu <mdomenan using gmail.com> a écrit : 

> Hello,
> Thanks for the tip. I tried that in several species and it looks like it does
> have an effect. The values with pairwise.deletion=TRUE are most of the times a
> bit higher than with pairwise.deletion=FALSE, and sometimes equal. Would you
> suggest using pairwise.deletion=TRUE then?
> I also tried to find if the very negative values in AMOVA results (like -0,25)
> were due to very low values of genetic distances, but there doesn't seem to be
> a relation.
> Thanks for your help,
> Marc

> On Thu, May 7, 2020 at 5:53 AM Emmanuel Paradis < [
> mailto:emmanuel.paradis using ird.fr | emmanuel.paradis using ird.fr ] > wrote:

>> To see if this has an impact, you can do this:

>> d0 <- dist.dna(x, "N")
>> d1 <- dist.dna(x, "N", pairwise.deletion = TRUE)
>> plot(d0, d1)
>> abline(0, 1, lty = 3) # draw x = y line

>> Best,

>> Emmanuel

>> ----- Le 6 Mai 20, à 21:35, Marc Domènech Andreu < [ mailto:mdomenan using gmail.com |
>> mdomenan using gmail.com ] > a écrit :

>>> Oh ok thanks. Well my sequences are COI sequences, a mitochondrial protein
>>> coding gene so there are no gaps in the middle. However, there are in the
>>> extremes, some sequences being longer than others. So I will set
>>> pairwise.deletion=TRUE as you suggest.
>>> Thanks,
>>> Marc

>>> On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis < [
>>> mailto:emmanuel.paradis using ird.fr | emmanuel.paradis using ird.fr ] > wrote:

>>>> ----- Le 5 Mai 20, à 0:36, Marc Domènech Andreu < [ mailto:mdomenan using gmail.com |
>>>> mdomenan using gmail.com ] > a écrit :

>>>>> Hi,
>>>>> Yes I tried it. Most of the results are very similar but some change. Do you
>>>>> know the difference between those two methods?

>>>> model = "N" is the Hamming distance (absolute number of differences between two
>>>> sequences)

>>>> model = "raw" is the Hamming distance divided by the sequence length (aka
>>>> uncorrected distance, or p-distance)

>>>> About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
>>>> simple/unique solution for this option. It depends very much on the data at
>>>> hand and the distribution of "missing data", especially gaps. You need to check
>>>> their distribution, for example with image(x) of image(x, what = "-") where 'x'
>>>> is the DNA data. You may get nonsensical results leaving the default
>>>> pairwise.deletion = FALSE if there are long gaps. Even a small number of gaps
>>>> may be problematic if there are in a column (site) which is polymorphic.

>>>> Best,

>>>> Emmanuel

>>>>> Thanks,
>>>>> Marc

>>>>> On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis < [
>>>>> mailto:emmanuel.paradis using ird.fr | emmanuel.paradis using ird.fr ] > wrote:

>>>>>> Hi Marc,

>>>>>> Have you tried model = "N" in dist.dna()?

>>>>>> Best,

>>>>>> Emmanuel

>>>>>> ----- Le 4 Mai 20, à 16:44, Marc Domènech Andreu [ mailto:mdomenan using gmail.com |
>>>>>> mdomenan using gmail.com ] a écrit :

>>>>>> > Thanks for your answer. For computing the distance matrix I am using the
>>>>>> > dist.dna function in Ape package, with the model set to "raw"
>>>>>> > and pairwise.deletion = FALSE. However I don't know exactly the equation or
>>>>>> > formula pegas uses for AMOVA.
>>>>>> > I am working with a mitochondrial marker so it would be haploid.
>>>>>> > Marc

>>>>>>> On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar < [ mailto:zkamvar using gmail.com |
>>>>>> > zkamvar using gmail.com ] > wrote:

>>>>>> >> This highly depends on the distance function you are using for pegas:

>>>>>> >> 1. How does it treat missing data? I believe Arlequin treats missing
>>>>>> >> data by dropping them from the denominator.

>>>>>> >> 2. If you have a diploid species, does it calculate distance for
>>>>>> >> haplotypes?

>>>>>> >> Both of these can affect the resulting Phi values. You might also try
>>>>>> >> poppr.amova() with the method = "pegas" function to automate the process.

>>>>>> >> Best,

>>>>>> >> Zhian

>>>>>> >> On 4/29/20 3:04 AM, Marc Domènech Andreu wrote:
>>>>>> >> > Hello everyone,
>>>>>> >> > I would like to ask for help with two questions regarding AMOVA and the
>>>>>> >> > Pegas package.
>>>>>> >> > 1. Do you know which is the formula or equation that Pegas and Arlequin
>>>>>> >> use
>>>>>> >> > for performing AMOVA? I only get to obtain almost identical results when
>>>>>> >> I
>>>>>> >> > set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in Arlequin.
>>>>>> >> > 2. I'm doing the analyses for several species, and for some of them I
>>>>>> >> > obtained negative AMOVA results. I know slightly negative results are not
>>>>>> >> > uncommon and as far as I know they should be treated as 0, but in some
>>>>>> >> > cases they are very negative, such as -25%. Why can this be? Maybe
>>>>>> >> because
>>>>>> >> > I have too few sequences for those species?
>>>>>> >> > Thanks in advance,
>>>>>> >> > Marc

>>>>>> >> _______________________________________________
>>>>>> >> R-sig-genetics mailing list
>>>>>> >> [ mailto:R-sig-genetics using r-project.org | R-sig-genetics using r-project.org ]
>>>>>>>> [ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
>>>>>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

>>>>>> > --
>>>>>> > *Marc Domènech Andreu*
>>>>>> > PhD student at University of Barcelona.
>>>>>> > Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.

>>>>>> > [[alternative HTML version deleted]]

>>>>>> > _______________________________________________
>>>>>> > R-sig-genetics mailing list
>>>>>> > [ mailto:R-sig-genetics using r-project.org | R-sig-genetics using r-project.org ]
>>>>>>> [ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

>>>>> --
>>>>> Marc Domènech Andreu
>>>>> PhD student at University of Barcelona.
>>>>> Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.

>>> --
>>> Marc Domènech Andreu
>>> PhD student at University of Barcelona.
>>> Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.

> --
> Marc Domènech Andreu
> PhD student at University of Barcelona.
> Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.

	[[alternative HTML version deleted]]