[R-sig-genetics] testing for admixture in R

Emmanuel Paradis Emm@nuel@P@r@di@ @ending from ird@fr
Wed Oct 17 09:46:33 CEST 2018


Hi all,

I'm working on an implementation of admixture estimation according to 
Alexander et al (2009, Genome Research) and Alexander & Lange (2011, BMC 
Bioinformatics) available in their program ADMIXTURE. My (short) 
question is: is anybody working on (or is aware of) something similar?

Some context: I'm reviewing the methods to assess structure from genomic 
data implemented in R. So far I've found adegenet::snapclust and 
rhierbaps::hierBAPS which implement similar methods to the above (but 
with different models). In addition, there are methods that fit models 
given a known structure: adegenet::dapc, pegas::amova, and several 
functions in the packages hierfstat and mmod (I may forget some here).

There are three other packages on CRAN: radmixture to test the origin of 
a single (personal) genotype, admixturegraph which plots graphs from the 
output of the standalone AdmixTools 
(https://github.com/DReichLab/AdmixTools), and LEAPFrOG to estimate 
admixture in parents from offspring genotypes.

The implementation of Alexander et al's method is pretty straightforward 
given the data classes in R ('loci' and 'genind') and the optimisation 
of the likelihood function seems to run well (on a toy data set at 
least) with nlminb called alternatively as described in the above 
papers. I think it could be useful to have this in R for, at least, 
three reasons:

1/ There are a few issues about the availability of the software 
mentioned above: ADMIXTURE source code is not available and only 
executables for Linux and MacOS can be downloaded 
(http://www.genetics.ucla.edu/software/admixture/download.html). 
AdmixTools is available only as C code and seems a bit tough to compile 
(see: https://github.com/DReichLab/AdmixTools/issues/43).

2/ We now have a lot of tools to handle and manipulate genetic/genomic 
data in R (e.g., ape::read.FASTA, adegenet::genlight, pegas:read.vcf, 
package vcfR and others on BioConductor) so that data format will not be 
an issue.

3/ It seems possible to extend the method to non-SNP loci and even to 
take LD into account. Again, we have tools for these in different R 
packages).

Any comments welcome.

Best,

Emmanuel
 
 
Ce message a été controlé par le service leader de filtrage de messagerie e-securemail de SECUSERVE, et est garanti sans virus connus.



More information about the R-sig-genetics mailing list