[R] Binomial GLM, chisq.test, or?
lincoln
miseno77 at hotmail.com
Fri May 4 17:38:48 CEST 2012
Hi,
I have a data set with 999 observations, for each of them I have data on
four variables:
site, colony, gender (quite a few NA values), and cohort.
This is how the data set looks like:
> str(dispersal)
'data.frame': 999 obs. of 4 variables:
$ site : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ...
$ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ...
$ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ...
$ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ...
Now, I want to estimate if sites 1 and site 2 differ on some of the other
variables. For instance there are relatively more males in site 1 with
respect to site 2, more individuals of the main colony in site 2 with
respect to site 1 and this sort of things.
I thought I might do a binomial GLM considering as response variable the
site, I tried to run the more general model to have a look to overdispersion
but I believe there is something wrong even before worrying about
overdispersion. I know (I did a chisq.test) that cohort2004 is very diversly
represented between the two sites but it is not reflected in the results of
GLM. Here there are the results of chisq.test and the GLM:
1)
/>
age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1)))
> dimnames(age_cohort)<-list(site=c("M","D"),
+
cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996))
> age_cohort
cohorts
site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
M 142 95 46 33 14 59 18 12 7 1 0
D 258 144 54 70 20 11 6 8 2 3 1
> (Xsqagec <- chisq.test(age_cohort)) # Prints test summary
Pearson's Chi-squared test
data: age_cohort
X-squared = 82.6016, df = 10, p-value = 1.549e-13
Mensajes de aviso perdidos
In chisq.test(age_cohort) : Chi-squared approximation may be incorrect
> Xsqagec$observed # observed counts
cohorts
site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
M 142 95 46 33 14 59 18 12 7 1 0
D 258 144 54 70 20 11 6 8 2 3 1
> Xsqagec$expected # expected counts under the null
cohorts
site 2010 2009 2008 2007 2006 2004 2003
M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717
D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283
cohorts
site 2002 2001 2000 1996
M 8.505976 3.827689 1.701195 0.4252988
D 11.494024 5.172311 2.298805 0.5747012
> Xsqagec$residuals # Pearson residuals
cohorts
site 2010 2009 2008 2007 2006
M -2.1559111 -0.6592367 0.5321050 -1.6326395 -0.1210101
D 1.8546283 0.5671101 -0.4577448 1.4044825 0.1040993
cohorts
site 2004 2003 2002 2001 2000
M 5.3569686 2.4391720 1.1980192 1.6214643 -0.5376032
D -4.6083465 -2.0983042 -1.0305993 -1.3948690 0.4624746
cohorts
site 1996
M -0.6521494
D 0.5610132
> Xsqagec$stdres # standardized residuals
cohorts
site 2010 2009 2008 2007 2006
M -3.6665549 -0.9962228 0.7397057 -2.2733888 -0.1623984
D 3.6665549 0.9962228 -0.7397057 2.2733888 0.1623984
cohorts
site 2004 2003 2002 2001 2000
M 7.3264142 3.2566808 1.5962909 2.1485311 -0.7105713
D -7.3264142 -3.2566808 -1.5962909 -2.1485311 0.7105713
cohorts
site 1996
M -0.8606814
D 0.8606814
/
2)
/> model1<-glm(site~gender*colony*cohort,binomial)
> summary(model1)
Call:
glm(formula = site ~ gender * colony * cohort, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.84648 -0.96954 -0.00036 1.11269 2.03933
Coefficients: (12 not defined because of singularities)
Estimate Std. Error z value
(Intercept) -1.657e+01 2.400e+03 -0.007
gender1 -2.231e-01 9.220e-01 -0.242
colonyother 9.531e-02 8.006e-01 0.119
cohort2002 1.717e-08 3.393e+03 0.000
cohort2003 1.766e+01 2.400e+03 0.007
cohort2004 1.807e+01 2.400e+03 0.008
cohort2006 1.697e+01 2.400e+03 0.007
cohort2007 1.726e+01 2.400e+03 0.007
cohort2008 1.606e+01 2.400e+03 0.007
cohort2009 1.657e+01 2.400e+03 0.007
cohort2010 1.587e+01 2.400e+03 0.007
gender1:colonyother 9.163e-01 1.087e+00 0.843
gender1:cohort2002 1.719e+01 2.400e+03 0.007
gender1:cohort2003 -1.823e-01 1.713e+00 -0.106
gender1:cohort2004 2.231e-01 1.329e+00 0.168
gender1:cohort2006 -5.878e-01 1.586e+00 -0.371
gender1:cohort2007 -6.454e-02 1.784e+00 -0.036
gender1:cohort2008 8.881e-01 1.156e+00 0.768
gender1:cohort2009 -2.817e-02 1.199e+00 -0.023
gender1:cohort2010 NA NA NA
colonyother:cohort2002 NA NA NA
colonyother:cohort2003 NA NA NA
colonyother:cohort2004 1.497e+01 1.697e+03 0.009
colonyother:cohort2006 -1.707e+01 2.400e+03 -0.007
colonyother:cohort2007 -7.885e-01 1.772e+00 -0.445
colonyother:cohort2008 NA NA NA
colonyother:cohort2009 NA NA NA
colonyother:cohort2010 NA NA NA
gender1:colonyother:cohort2002 NA NA NA
gender1:colonyother:cohort2003 NA NA NA
gender1:colonyother:cohort2004 -9.163e-01 2.400e+03 0.000
gender1:colonyother:cohort2006 NA NA NA
gender1:colonyother:cohort2007 -2.575e+00 2.379e+00 -1.082
gender1:colonyother:cohort2008 NA NA NA
gender1:colonyother:cohort2009 NA NA NA
gender1:colonyother:cohort2010 NA NA NA
Pr(>|z|)
(Intercept) 0.994
gender1 0.809
colonyother 0.905
cohort2002 1.000
cohort2003 0.994
cohort2004 0.994
cohort2006 0.994
cohort2007 0.994
cohort2008 0.995
cohort2009 0.994
cohort2010 0.995
gender1:colonyother 0.399
gender1:cohort2002 0.994
gender1:cohort2003 0.915
gender1:cohort2004 0.867
gender1:cohort2006 0.711
gender1:cohort2007 0.971
gender1:cohort2008 0.442
gender1:cohort2009 0.981
gender1:cohort2010 NA
colonyother:cohort2002 NA
colonyother:cohort2003 NA
colonyother:cohort2004 0.993
colonyother:cohort2006 0.994
colonyother:cohort2007 0.656
colonyother:cohort2008 NA
colonyother:cohort2009 NA
colonyother:cohort2010 NA
gender1:colonyother:cohort2002 NA
gender1:colonyother:cohort2003 NA
gender1:colonyother:cohort2004 1.000
gender1:colonyother:cohort2006 NA
gender1:colonyother:cohort2007 0.279
gender1:colonyother:cohort2008 NA
gender1:colonyother:cohort2009 NA
gender1:colonyother:cohort2010 NA
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 311.91 on 224 degrees of freedom
Residual deviance: 271.61 on 201 degrees of freedom
(774 observations deleted due to missingness)
AIC: 319.61
Number of Fisher Scoring iterations: 15
/
I thought that perhaps keeping the gender as explanatory variable was
reducing a lot the sample size and it was the matter (I removed it):
/> model1<-glm(site~colony*cohort,binomial)
> summary(model1)
Call:
glm(formula = site ~ colony * cohort, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8683 -0.9712 -0.8757 1.3468 1.7470
Coefficients: (6 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.5661 1455.3976 -0.011 0.991
colonyother 0.2544 0.2167 1.174 0.240
cohort2000 14.4675 1455.3981 0.010 0.992
cohort2001 16.8188 1455.3978 0.012 0.991
cohort2002 15.9715 1455.3977 0.011 0.991
cohort2003 16.6647 1455.3977 0.011 0.991
cohort2004 17.1194 1455.3976 0.012 0.991
cohort2006 15.2607 1455.3976 0.010 0.992
cohort2007 14.9470 1455.3976 0.010 0.992
cohort2008 15.3837 1455.3976 0.011 0.992
cohort2009 15.1762 1455.3976 0.010 0.992
cohort2010 14.8053 1455.3976 0.010 0.992
colonyother:cohort2000 NA NA NA NA
colonyother:cohort2001 NA NA NA NA
colonyother:cohort2002 NA NA NA NA
colonyother:cohort2003 NA NA NA NA
colonyother:cohort2004 13.7583 550.0887 0.025 0.980
colonyother:cohort2006 -15.5151 1455.3976 -0.011 0.991
colonyother:cohort2007 -0.9163 0.5979 -1.533 0.125
colonyother:cohort2008 NA NA NA NA
colonyother:cohort2009 -0.6912 0.5214 -1.326 0.185
colonyother:cohort2010 NA NA NA NA
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1361.4 on 998 degrees of freedom
Residual deviance: 1267.9 on 983 degrees of freedom
AIC: 1299.9
Number of Fisher Scoring iterations: 14
/
Any comment/suggestion on this?
Thanks for any help
--
View this message in context: http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list