# [R-sig-ME] Fitting known var-cov matrix in logistic regression model

Margaret Mackinnon mmackinnon at kilifi.kemri-wellcome.org
Fri Apr 22 20:31:44 CEST 2011

```Dear David

Thanks for your response.

I tried geeglm and glmPQL and MCMCglmm.  I still can't work out how to fit the fixed var-cov structure. Here is what I did for MCMCglmm (my preferred option):

gpdata<-data.frame(pos=as.integer(freq*n*2),neg=as.integer(n*2-freq*n*2),pop=c(1:15),env)
prior<-list(R=list(V=diag(1),nu=0.002),G=list(G1=list(V=10,nu=0.002)))
out<-MCMCglmm(cbind(pos,neg)~1+env,random=~pop,data=gpdata,family="multinomial2",prior=prior,burnin=3000,nitt=20000)
summary(out\$VCV)
plot(out\$VCV)
plot(out\$Sol)
summary(out\$Sol)

This gives sensible answers for the random effect (pop) variance and the fixed effects (intercept + env), i.e. those given by lmer and glm.  However I can't understand the manual on how to fit a predfined value of the var-cov matrix that describes the different variances for each pop and the covariances between them.  Is it in rcov, or in the priors (R or G?) with the fix option on.

I also could not get the family="zibinomial" to work because it needs idh or us variance functions which I don't know how to set up for this particular problem.

I would really appreciate some help on this.

Margaret

>>> "David Duffy" <davidD at qimr.edu.au> 22 April 2011 04:49 >>>
On Thu, 21 Apr 2011, Margaret Mackinnon wrote:

> I want to estimate the relationship between population allele
> frequencies at a certain locus of interest and an environmental
> variable.  The hypothesis is that this environmental variable has
> generated these different allele frequencies through differential
> selection pressure on a locus under balancing selection (i.e., there is
> negative selection which balances the differential positive selection on
> the allele of interest).   The allele frequencies are measured in 15
> different populations, as are the environmental variables.  However,
> this environmental variable has been aggregated at the population level
> so there are only 15 values that it can take.
>
>  The populations are genetically related to each other to different
> degrees and so I want to reflect this structuring by including in the
> model a variance-covariance matrix which captures this relatedness.  I
> estimated this variance-covariance matrix from a set of independent data
> on SNPs at a whole lot of putatively neutral loci that are not expected

The easiest way might be to use the SNP principal component scores for the
populations as covariates in an ordinary logistic regression. Otherwise
a GEE with your specified matrix, glmmPQL with a corSymm as per your model,
MCMCglmm.

--
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.