[R-sig-genetics] Pegas vs Arlequin, and negative AMOVA values

Tue May 12 11:12:07 CEST 2020

Thanks for the answer, the example really helped me understand it much
better. After some exploration of my data I think the case is more similar
to the example, with few but long gaps at the ends rather than many short
ones. So I think I will remove those and see if the estimations improve.
Thanks again,
Marc

On Fri, May 8, 2020 at 6:08 AM Emmanuel Paradis <emmanuel.paradis using ird.fr>
wrote:

> I think you should go deeper in your data exploration. Here are two other
> other diagnostics you can do:
>
> del.colgapsonly(x, freq.only = TRUE)
> del.rowgapsonly(x, freq.only = TRUE)
>
> These will give you the number of gaps for each column and row,
> respectively.
>
> Imagine the following situation: x is an alignment with 100 sequences and
> 1000 sites, all sequences are complete with no ambiguity, except one which
> has 500 bp from the 5'-end, so it has a trail of 500 "-" on the 3'-end to
> be aligned with the 99 others. Doing base.freq(x, all = TRUE) will show
> that there are 0.5% of gaps so you may think it's OK. But that's wrong.
> Doing dist.dna(x) will throw 50% of the data (even if you add more complete
> sequences to the alignment!). If the rates of evolution are different in
> the two halves of the sequence, then comparing the results from dist.dna(x)
> and dist.dna(x, pairwise.deletion = TRUE) is likely to be very tricky.
>
> That's where the two above diagnostics may help you: you may find better
> to remove some sequences if they have a lot of gaps and they create more
> trouble than anything else.
>
> HTH
>
> Best,
>
> Emmanuel
>
> ----- Le 8 Mai 20, à 4:07, Marc Domènech Andreu <mdomenan using gmail.com> a
> écrit :
>
> Hello,
> Thanks for the tip. I tried that in several species and it looks like it
> does have an effect. The values with pairwise.deletion=TRUE are most of the
> times a bit higher than with pairwise.deletion=FALSE, and sometimes equal.
> Would you suggest using pairwise.deletion=TRUE then?
> I also tried to find if the very negative values in AMOVA results (like
> -0,25) were due to very low values of genetic distances, but there doesn't
> seem to be a relation.
> Thanks for your help,
> Marc
>
> On Thu, May 7, 2020 at 5:53 AM Emmanuel Paradis <emmanuel.paradis using ird.fr>
> wrote:
>
>> To see if this has an impact, you can do this:
>>
>> d0 <- dist.dna(x, "N")
>> d1 <- dist.dna(x, "N", pairwise.deletion = TRUE)
>> plot(d0, d1)
>> abline(0, 1, lty = 3) # draw x = y line
>>
>> Best,
>>
>> Emmanuel
>>
>> ----- Le 6 Mai 20, à 21:35, Marc Domènech Andreu <mdomenan using gmail.com> a
>> écrit :
>>
>> Oh ok thanks. Well my sequences are COI sequences, a mitochondrial
>> protein coding gene so there are no gaps in the middle. However, there are
>> in the extremes, some sequences being longer than others. So I will set
>> pairwise.deletion=TRUE as you suggest.
>> Thanks,
>> Marc
>>
>> On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis <emmanuel.paradis using ird.fr>
>> wrote:
>>
>>> ----- Le 5 Mai 20, à 0:36, Marc Domènech Andreu <mdomenan using gmail.com> a
>>> écrit :
>>>
>>> Hi,
>>> Yes I tried it. Most of the results are very similar but some change. Do
>>> you know the difference between those two methods?
>>>
>>>
>>> model = "N" is the Hamming distance (absolute number of differences
>>> between two sequences)
>>>
>>> model = "raw" is the Hamming distance divided by the sequence length
>>> (aka uncorrected distance, or p-distance)
>>>
>>> About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
>>> simple/unique solution for this option. It depends very much on the data at
>>> hand and the distribution of "missing data", especially gaps. You need to
>>> check their distribution, for example with image(x) of image(x, what = "-")
>>> where 'x' is the DNA data. You may get nonsensical results leaving the
>>> default pairwise.deletion = FALSE if there are long gaps. Even a small
>>> number of gaps may be problematic if there are in a column (site) which is
>>> polymorphic.
>>>
>>> Best,
>>>
>>> Emmanuel
>>>
>>> Thanks,
>>> Marc
>>>
>>> On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis <emmanuel.paradis using ird.fr>
>>> wrote:
>>>
>>>> Hi Marc,
>>>>
>>>> Have you tried model = "N" in dist.dna()?
>>>>
>>>> Best,
>>>>
>>>> Emmanuel
>>>>
>>>> ----- Le 4 Mai 20, à 16:44, Marc Domènech Andreu mdomenan using gmail.com a
>>>> écrit :
>>>>
>>>> > Thanks for your answer. For computing the distance matrix I am using
>>>> the
>>>> > dist.dna function in Ape package, with the model set to "raw"
>>>> > and pairwise.deletion = FALSE. However I don't know exactly the
>>>> equation or
>>>> > formula pegas uses for AMOVA.
>>>> > I am working with a mitochondrial marker so it would be haploid.
>>>> > Marc
>>>> >
>>>> > On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar using gmail.com>
>>>> wrote:
>>>> >
>>>> >> This highly depends on the distance function you are using for pegas:
>>>> >>
>>>> >> 1. How does it treat missing data? I believe Arlequin treats missing
>>>> >> data by dropping them from the denominator.
>>>> >>
>>>> >> 2. If you have a diploid species, does it calculate distance for
>>>> >> haplotypes?
>>>> >>
>>>> >> Both of these can affect the resulting Phi values. You might also try
>>>> >> poppr.amova() with the method = "pegas" function to automate the
>>>> process.
>>>> >>
>>>> >> Best,
>>>> >>
>>>> >> Zhian
>>>> >>
>>>> >> On 4/29/20 3:04 AM, Marc Domènech Andreu wrote:
>>>> >> > Hello everyone,
>>>> >> > I would like to ask for help with two questions regarding AMOVA
>>>> and the
>>>> >> > Pegas package.
>>>> >> > 1. Do you know which is the formula or equation that Pegas and
>>>> Arlequin
>>>> >> use
>>>> >> > for performing AMOVA? I only get to obtain almost identical
>>>> results when
>>>> >> I
>>>> >> > set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in
>>>> Arlequin.
>>>> >> > 2. I'm doing the analyses for several species, and for some of
>>>> them I
>>>> >> > obtained negative AMOVA results. I know slightly negative results
>>>> are not
>>>> >> > uncommon and as far as I know they should be treated as 0, but in
>>>> some
>>>> >> > cases they are very negative, such as -25%. Why can this be? Maybe
>>>> >> because
>>>> >> > I have too few sequences for those species?
>>>> >> > Thanks in advance,
>>>> >> > Marc
>>>> >> >
>>>> >>
>>>> >> _______________________________________________
>>>> >> R-sig-genetics mailing list
>>>> >> R-sig-genetics using r-project.org
>>>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-genetics
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > *Marc Domènech Andreu*
>>>> > PhD student at University of Barcelona.
>>>> > Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.
>>>> >
>>>> >       [[alternative HTML version deleted]]
>>>> >
>>>> > _______________________________________________
>>>> > R-sig-genetics mailing list
>>>> > R-sig-genetics using r-project.org
>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-genetics
>>>>
>>>
>>>
>>> --
>>> *Marc Domènech Andreu*
>>> PhD student at University of Barcelona.
>>> Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.
>>>
>>>
>>
>> --
>> *Marc Domènech Andreu*
>> PhD student at University of Barcelona.
>> Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.
>>
>>
>
> --
> *Marc Domènech Andreu*
> PhD student at University of Barcelona.
> Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.
>
>

-- 
*Marc Domènech Andreu*
PhD student at University of Barcelona.
Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals.

	[[alternative HTML version deleted]]