[R-sig-genetics] testing for admixture in R
Emmanuel Paradis
Emm@nuel@P@r@di@ @ending from ird@fr
Wed Oct 17 09:46:33 CEST 2018
Hi all,
I'm working on an implementation of admixture estimation according to
Alexander et al (2009, Genome Research) and Alexander & Lange (2011, BMC
Bioinformatics) available in their program ADMIXTURE. My (short)
question is: is anybody working on (or is aware of) something similar?
Some context: I'm reviewing the methods to assess structure from genomic
data implemented in R. So far I've found adegenet::snapclust and
rhierbaps::hierBAPS which implement similar methods to the above (but
with different models). In addition, there are methods that fit models
given a known structure: adegenet::dapc, pegas::amova, and several
functions in the packages hierfstat and mmod (I may forget some here).
There are three other packages on CRAN: radmixture to test the origin of
a single (personal) genotype, admixturegraph which plots graphs from the
output of the standalone AdmixTools
(https://github.com/DReichLab/AdmixTools), and LEAPFrOG to estimate
admixture in parents from offspring genotypes.
The implementation of Alexander et al's method is pretty straightforward
given the data classes in R ('loci' and 'genind') and the optimisation
of the likelihood function seems to run well (on a toy data set at
least) with nlminb called alternatively as described in the above
papers. I think it could be useful to have this in R for, at least,
three reasons:
1/ There are a few issues about the availability of the software
mentioned above: ADMIXTURE source code is not available and only
executables for Linux and MacOS can be downloaded
(http://www.genetics.ucla.edu/software/admixture/download.html).
AdmixTools is available only as C code and seems a bit tough to compile
(see: https://github.com/DReichLab/AdmixTools/issues/43).
2/ We now have a lot of tools to handle and manipulate genetic/genomic
data in R (e.g., ape::read.FASTA, adegenet::genlight, pegas:read.vcf,
package vcfR and others on BioConductor) so that data format will not be
an issue.
3/ It seems possible to extend the method to non-SNP loci and even to
take LD into account. Again, we have tools for these in different R
packages).
Any comments welcome.
Best,
Emmanuel
Ce message a été controlé par le service leader de filtrage de messagerie e-securemail de SECUSERVE, et est garanti sans virus connus.
More information about the R-sig-genetics
mailing list