[R-sig-genetics] [R-sig-phylo] Improves haplotype() function in pegas 0.13

Emmanuel Paradis emm@nue|@p@r@d|@ @end|ng |rom |rd@|r
Thu May 14 08:25:33 CEST 2020


Hi Jarrett,

I'm Cc'ing to r-sig-genetics since we had a recent discussion on a similar topic (see below).

----- Le 14 Mai 20, à 9:59, Jarrett Phillips phillipsjarrett1 using gmail.com a écrit :
> Hello,
> 
> Emmanuel Paradis has recently updated the haplotype() function in pegas
> 0.13 to account for base ambiguities, gaps and Ns. Thank you Emmanuel!

Base ambiguities were already considered in previous versions of pegas, but it was not explicit (or flexible).

> The argument  'strict' simply considers or ignores all gaps and
> ambiguities, but does this also consider/ignore Ns?

Yes. 'strict' means "strict interpretation of the characters without interpreting them as base ambiguities". For instance, consider the following 9 aligned sequences with 2 sites (without labels for simplicity):

AA
AR
AM
AW
AV
AH
AD
AN
A-

By default (and with the last version of pegas), haplotype() will return a single haplotype because it cannot be inferred whether any of the sequences 2-9 is different from the first one. If strict = TRUE, nine haplotypes will be returned.

> The 'trailingGapsAsN' simply treats leading and trailing gaps as Ns,
> ignoring internal gaps. This argument is set to TRUE by default.
> 
> From the above, it appears that  'strict' ignores Ns. If 'strict' is set to
> TRUE, does this mean that TRUE/FALSE assignment 'trailingGapsAsN' is ignored
> as well?

Yes. I've added a line in the help page of haplotype() to say that 'trailingGapsAsN' has no effect if 'strict = TRUE'.

> The reason I ask is because I use haplotype() in one of my R packages to
> compute optimal sample sizes for genetic diversity assessment (HACSim).
> Currently in my package, R throws a warning to users if missing data or
> base ambiguities are present within DNA alignments.
> 
> Given Emmanuel's changes, it seems the warning in my package will not be
> needed once I set 'strict = TRUE'. I am unsure however on how to properly
> set 'trailingGapsAsN' to ensure that gaps do not affect haplotype
> calculation if they are left in the alignment. Gaps, ambiguities and Ns
> will cause an overestimation of haplotypes, and therefore an inflation of
> standing genetic variation.

Maybe the discussion we had on r-sig-genetics could be relevant here. There doesn't seem to be an easy answer to these questions.

Also, the coming version of ape will include the new function latag2n (Leading and Trailing Alignment Gaps to N) which changes sequences such as "A-C-" into "A-CN".

Cheers,

Emmanuel

> Can someone weigh in on this?
> 
> Thanks!
> 
> Cheers,
> 
> Jarrett
> 
>	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



More information about the R-sig-genetics mailing list