[R-sig-genetics] Hierfstat: g-statistics permutation

Wed Nov 2 05:25:58 CET 2016

Hi
Using some of the permutation functions in hierfstat (test.g; test.between;
gstat.randtest), I've encountered the same issue raised in this recent post:
https://stat.ethz.ch/pipermail/r-sig-genetics/2016-August/000099.html

In my case (6 pops x 8ind; ~8000 loci), testing for significant
differentiation among pops with small numbers of loci (e.g. 20) appears
normal, but as number of loci increase >100 the permuted g-statistics
mostly converge on a single value or 0.

This appears related to missing data. Demonstrated below on small dataset.

library(hierfstat)
#Simulate 100 loci
dat.sim <-
sim.genot(size=8,nbal=2,nbloc=100,nbpop=6,N=1000,mig=0.001,mut=0.0001,f=0)

#perform permutation test with no missing data
g <- test.g(dat.sim[,-1], level = dat.sim$Pop, nperm = 100)

#are all 100 permuted g-statistics are unique? yes
length(unique(g$g.star))

#add one missing genotype per locus. Total missing data = 1%
dat.sim[,-1] <- apply(dat.sim[,-1], 2, function(x){x[sample(1:48,1)] <- NA;
x})

#how many permuted g-statistics are unique? generally less than 20.
length(unique(g$g.star))

Is there any way to use these functions to calculate p-values when missing
data is present? The functions work fine when I remove loci with missing
data, so it's not the end of the world.
Thanks for any advice.
Regards, Dan

-- 
Dr. Daniel J. Schmidt
Research Fellow, Australian Rivers Institute
Griffith University 170 Kessels Road, Nathan
Brisbane QLD 4111 Australia

d.schmidt at griffith.edu.au
Office: +61 7 37354165
http://www.rivers.edu.au

	[[alternative HTML version deleted]]