[R-sig-ME] Fitting known var-cov matrix in logistic regression model

Thu Apr 21 20:46:04 CEST 2011

Dear all

I wonder if someone can help me with the following problem.

I want to estimate the relationship between population allele
frequencies at a certain locus of interest and an environmental 
variable.  The hypothesis is that this environmental variable has
generated these different allele frequencies through differential
selection pressure on a locus under balancing selection (i.e., there is
negative selection which balances the differential positive selection on
the allele of interest).   The allele frequencies are measured in 15
different populations, as are the environmental variables.  However,
this environmental variable has been aggregated at the population level
so there are only 15 values that it can take.

 The populations are genetically related to each other to different
degrees and so I want to reflect this structuring by including in the
model a variance-covariance matrix which captures this relatedness.  I
estimated this variance-covariance matrix from a set of independent data
on SNPs at a whole lot of putatively neutral loci that are not expected
to show environmental clines.  

I have already fit a logistic regression model to individual allele
data with an intercept term and a linear covariate term for the
environmental  variable (see code below).    I now want to add the
var-cov structure into the model.  When I have done that I want to drop
the environmental variable from the model and test for its significance.
 Thus I am testing for an environmentally related cline in allele
frequencies  but allowing for the fact that some of the populations
‘group’ together genetically.  

I have looked at corSymm in nmle as a way to fit a pre-defined
variance-cov matrix and the ‘correlation’ parameter in function gls
but can’t work out how to code my  model.  I also suspect I need to
use the ‘weights’ argument to account for the fact that the
variance-covariance matrix is estimated from populations of different
sizes, though I am just guessing on this!

Here is the code that I have used so far and which I now want to modify
to fit the var-cov matrix to reflect population structure.  

Any help would be appreciated and thanks in advance.

Margaret Mackinnon

#Values of the environmental variable for the 15 populations
env<-c(0.3102664, 0.2559409, 0.1693839, 0.4222649, 0.3060545,
0.3849413, 0.1421446, 0.1643560, 0.1863432, 0.1437641, 0.1126848,
0.2551449, 0.3908605, 0.1837551, 0.3115154)

#Allele frequencies in the 15 populations
freq<-c( 0.12534626, 0.09888190, 0.06529210,  0.07619048,  0.08200637, 
0.06678383,  0.08702532,  0.05871560,  0.07647059,   0.07461977, 
0.07801095,  0.08383234,  0.09095581, 0.06042654,   0.12130707)

#No. of people genotyped for each population , from which allele
frequencies were calculated
n<-c( 722, 1431,
582,315,1256,569,948,1090,510,1052,1096,501,973,1266,1117)

pop<-c(1:15)

#See the end of this email for the var-cov matrix that I want to fit.

#This is the code for fitting the model without the cov structure  
glm.pred<-glm(freq~env,weights=n*2,family=binomial) #Note that 2
alleles per person, hence weights of n*2
summary(glm.pred)
pred<-predict(glm.pred,data.frame("env"=c(seq(0.05,0.5,0.01))),type="response")
#Plot observed and predicted data
plot(freq~env,xlim=c(0.05,0.5))
points(pred~seq(0.05,0.5,0.01))

#############And here is the output:###################

Call:
glm(formula = freq ~ env, family = binomial, weights = n * 2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7845  -1.9236  -0.2746   1.4749   4.8131  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.68838    0.06294 -42.713  < 2e-16 ***
env          1.23044    0.23903   5.148 2.64e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 128.93  on 14  degrees of freedom
Residual deviance: 102.63  on 13  degrees of freedom
AIC: 206.37

Number of Fisher Scoring iterations: 4

######End of output#########

#This is the var-cov matrix I want to fit into the linear model above.
vc<-matrix(data=c(0.001146,0.00046,-0.000538,-0.000399,-5.6e-05,-0.000156,-0.00011,-0.000416,-3.2e-05,0.000101,-0.000131,-0.000406,-8.7e-05,-0.000351,0.000549,0.00046,0.000594,-0.000251,-0.000153,7.1e-05,-0.000134,-0.00023,-0.000343,7.6e-05,-2.5e-05,-0.000132,-0.000284,-8.3e-05,-0.00031,0.000438,-0.000538,-0.000251,0.000785,0,-3.5e-05,1e-04,9.4e-05,0.000317,-7.4e-05,-6.4e-05,-2.3e-05,0.000225,2.5e-05,0.000113,-0.000321,-0.000399,-0.000153,0,0.000923,-2.9e-05,0.000465,4.8e-05,4.1e-05,9.3e-05,-0.000112,-2.5e-05,0.000102,6.7e-05,0.000131,-0.000253,-5.6e-05,7.1e-05,-3.5e-05,-2.9e-05,0.000165,-0.000112,-7e-05,-5.8e-05,3.1e-05,-3.1e-05,-2.6e-05,-0.000148,-1e-05,-2.7e-05,9.1e-05,-0.000156,-0.000134,1e-04,0.000465,-0.000112,0.000814,9.9e-05,6e-06,4e-05,-8.3e-05,-0.000116,-4e-06,0.000101,7.5e-05,-0.000277,-0.00011,-0.00023,9.4e-05,4.8e-05,-7e-05,9.9e-05,0.000271,0.000106,-8.5e-05,-2.7e-05,5e-05,0.000114,3.3e-05,9.2e-05,-0.000198,-0.000416,-0.000343,0.000317,4.1e-05,-5.8e-05,6e-06,0.000106,0.000403,-5.4e-05,1.7e-05,0.000107,0.000229,-4e-06,0.000202,-0.000415,-3.2e-05,7.6e-05,-7.4e-05,9.3e-05,3.1e-05,4e-05,-8.5e-05,-5.4e-05,0.000351,-1.2e-05,-0.000108,-1.4e-05,1.2e-05,-6.6e-05,4.2e-05,0.000101,-2.5e-05,-6.4e-05,-0.000112,-3.1e-05,-8.3e-05,-2.7e-05,1.7e-05,-1.2e-05,2e-04,-4.2e-05,-8e-05,-8e-06,-1.7e-05,1.5e-05,-0.000131,-0.000132,-2.3e-05,-2.5e-05,-2.6e-05,-0.000116,5e-05,0.000107,-0.000108,-4.2e-05,0.000285,0.000139,-4.7e-05,8.3e-05,-0.000128,-0.000406,-0.000284,0.000225,0.000102,-0.000148,-4e-06,0.000114,0.000229,-1.4e-05,-8e-05,0.000139,0.000818,3.2e-05,0.000218,-0.000407,-8.7e-05,-8.3e-05,2.5e-05,6.7e-05,-1e-05,0.000101,3.3e-05,-4e-06,1.2e-05,-8e-06,-4.7e-05,3.2e-05,0.00014,2.4e-05,-8e-05,-0.000351,-0.00031,0.000113,0.000131,-2.7e-05,7.5e-05,9.2e-05,0.000202,-6.6e-05,-1.7e-05,8.3e-05,0.000218,2.4e-05,0.000322,-0.000357,0.000549,0.000438,-0.000321,-0.000253,9.1e-05,-0.000277,-0.000198,-0.000415,4.2e-05,1.5e-05,-0.000128,-0.000407,-8e-05,-0.000357,0.000816
),nrow=15,ncol=15)

#Visualise the population structure
heatmap(vc)

This e-mail (including any attachment to it) contains information which
is confidential. It is intended only for the use of the named recipient.
If you have received this e-mail in error, please let us know by
replying to the sender, and immediately delete it from your system. 
Please note, that in these circumstances, the use, disclosure,
distribution or copying of this information is strictly prohibited. We
apologize for any inconvenience that may have been caused to you.
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the 
accuracy or completeness of this message as it has been transmitted over
a public network. KEMRI-Wellcome Trust Programme reserves the right to
monitor all incoming and outgoing email traffic. Although the Programme
has taken reasonable precautions to ensure no viruses are present in
emails, it cannot accept responsibility for any loss or damage arising
from the use of the email or attachments. Any views expressed in this
message are those of the individual sender, except where the sender
specifically states them to be the views of KEMRI-Wellcome Trust
Programme.