[R] Binomial GLM, chisq.test, or?

lincoln miseno77 at hotmail.com
Fri May 4 17:38:48 CEST 2012


Hi,

I have a data set with 999 observations, for each of them I have data on
four variables:
site, colony, gender (quite a few NA values), and cohort.

This is how the data set looks like:
> str(dispersal)
'data.frame':	999 obs. of  4 variables:
 $ site  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ...
 $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ...
 $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ...
 $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ...

Now, I want to estimate if sites 1 and site 2 differ on some of the other
variables. For instance there are relatively more males in site 1 with
respect to site 2, more individuals of the main colony in site 2 with
respect to site 1 and this sort of things.

I thought I might do a binomial GLM considering as response variable the
site, I tried to run the more general model to have a look to overdispersion
but I believe there is something wrong even before worrying about
overdispersion. I know (I did a chisq.test) that cohort2004 is very diversly
represented between the two sites but it is not reflected in the results of
GLM. Here there are the results of chisq.test and the GLM:

1) 
/>
age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1)))
> dimnames(age_cohort)<-list(site=c("M","D"),
+                       
cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996))
> age_cohort
    cohorts
site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
   M  142   95   46   33   14   59   18   12    7    1    0
   D  258  144   54   70   20   11    6    8    2    3    1
> (Xsqagec <- chisq.test(age_cohort))  # Prints test summary

	Pearson's Chi-squared test

data:  age_cohort 
X-squared = 82.6016, df = 10, p-value = 1.549e-13

Mensajes de aviso perdidos
In chisq.test(age_cohort) : Chi-squared approximation may be incorrect
> Xsqagec$observed   # observed counts
    cohorts
site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
   M  142   95   46   33   14   59   18   12    7    1    0
   D  258  144   54   70   20   11    6    8    2    3    1
> Xsqagec$expected   # expected counts under the null
    cohorts
site     2010     2009     2008     2007     2006     2004     2003
   M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717
   D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283
    cohorts
site      2002     2001     2000      1996
   M  8.505976 3.827689 1.701195 0.4252988
   D 11.494024 5.172311 2.298805 0.5747012
> Xsqagec$residuals  # Pearson residuals
    cohorts
site       2010       2009       2008       2007       2006
   M -2.1559111 -0.6592367  0.5321050 -1.6326395 -0.1210101
   D  1.8546283  0.5671101 -0.4577448  1.4044825  0.1040993
    cohorts
site       2004       2003       2002       2001       2000
   M  5.3569686  2.4391720  1.1980192  1.6214643 -0.5376032
   D -4.6083465 -2.0983042 -1.0305993 -1.3948690  0.4624746
    cohorts
site       1996
   M -0.6521494
   D  0.5610132
> Xsqagec$stdres     # standardized residuals
    cohorts
site       2010       2009       2008       2007       2006
   M -3.6665549 -0.9962228  0.7397057 -2.2733888 -0.1623984
   D  3.6665549  0.9962228 -0.7397057  2.2733888  0.1623984
    cohorts
site       2004       2003       2002       2001       2000
   M  7.3264142  3.2566808  1.5962909  2.1485311 -0.7105713
   D -7.3264142 -3.2566808 -1.5962909 -2.1485311  0.7105713
    cohorts
site       1996
   M -0.8606814
   D  0.8606814
/


2)
/> model1<-glm(site~gender*colony*cohort,binomial)
> summary(model1)

Call:
glm(formula = site ~ gender * colony * cohort, family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.84648  -0.96954  -0.00036   1.11269   2.03933  

Coefficients: (12 not defined because of singularities)
                                 Estimate Std. Error z value
(Intercept)                    -1.657e+01  2.400e+03  -0.007
gender1                        -2.231e-01  9.220e-01  -0.242
colonyother                     9.531e-02  8.006e-01   0.119
cohort2002                      1.717e-08  3.393e+03   0.000
cohort2003                      1.766e+01  2.400e+03   0.007
cohort2004                      1.807e+01  2.400e+03   0.008
cohort2006                      1.697e+01  2.400e+03   0.007
cohort2007                      1.726e+01  2.400e+03   0.007
cohort2008                      1.606e+01  2.400e+03   0.007
cohort2009                      1.657e+01  2.400e+03   0.007
cohort2010                      1.587e+01  2.400e+03   0.007
gender1:colonyother             9.163e-01  1.087e+00   0.843
gender1:cohort2002              1.719e+01  2.400e+03   0.007
gender1:cohort2003             -1.823e-01  1.713e+00  -0.106
gender1:cohort2004              2.231e-01  1.329e+00   0.168
gender1:cohort2006             -5.878e-01  1.586e+00  -0.371
gender1:cohort2007             -6.454e-02  1.784e+00  -0.036
gender1:cohort2008              8.881e-01  1.156e+00   0.768
gender1:cohort2009             -2.817e-02  1.199e+00  -0.023
gender1:cohort2010                     NA         NA      NA
colonyother:cohort2002                 NA         NA      NA
colonyother:cohort2003                 NA         NA      NA
colonyother:cohort2004          1.497e+01  1.697e+03   0.009
colonyother:cohort2006         -1.707e+01  2.400e+03  -0.007
colonyother:cohort2007         -7.885e-01  1.772e+00  -0.445
colonyother:cohort2008                 NA         NA      NA
colonyother:cohort2009                 NA         NA      NA
colonyother:cohort2010                 NA         NA      NA
gender1:colonyother:cohort2002         NA         NA      NA
gender1:colonyother:cohort2003         NA         NA      NA
gender1:colonyother:cohort2004 -9.163e-01  2.400e+03   0.000
gender1:colonyother:cohort2006         NA         NA      NA
gender1:colonyother:cohort2007 -2.575e+00  2.379e+00  -1.082
gender1:colonyother:cohort2008         NA         NA      NA
gender1:colonyother:cohort2009         NA         NA      NA
gender1:colonyother:cohort2010         NA         NA      NA
                               Pr(>|z|)
(Intercept)                       0.994
gender1                           0.809
colonyother                       0.905
cohort2002                        1.000
cohort2003                        0.994
cohort2004                        0.994
cohort2006                        0.994
cohort2007                        0.994
cohort2008                        0.995
cohort2009                        0.994
cohort2010                        0.995
gender1:colonyother               0.399
gender1:cohort2002                0.994
gender1:cohort2003                0.915
gender1:cohort2004                0.867
gender1:cohort2006                0.711
gender1:cohort2007                0.971
gender1:cohort2008                0.442
gender1:cohort2009                0.981
gender1:cohort2010                   NA
colonyother:cohort2002               NA
colonyother:cohort2003               NA
colonyother:cohort2004            0.993
colonyother:cohort2006            0.994
colonyother:cohort2007            0.656
colonyother:cohort2008               NA
colonyother:cohort2009               NA
colonyother:cohort2010               NA
gender1:colonyother:cohort2002       NA
gender1:colonyother:cohort2003       NA
gender1:colonyother:cohort2004    1.000
gender1:colonyother:cohort2006       NA
gender1:colonyother:cohort2007    0.279
gender1:colonyother:cohort2008       NA
gender1:colonyother:cohort2009       NA
gender1:colonyother:cohort2010       NA

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 311.91  on 224  degrees of freedom
Residual deviance: 271.61  on 201  degrees of freedom
  (774 observations deleted due to missingness)
AIC: 319.61

Number of Fisher Scoring iterations: 15
/

I thought that perhaps keeping the gender as explanatory variable was
reducing a lot the sample size and it was the matter (I removed it):



/> model1<-glm(site~colony*cohort,binomial)
> summary(model1)

Call:
glm(formula = site ~ colony * cohort, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8683  -0.9712  -0.8757   1.3468   1.7470  

Coefficients: (6 not defined because of singularities)
                        Estimate Std. Error z value Pr(>|z|)
(Intercept)             -15.5661  1455.3976  -0.011    0.991
colonyother               0.2544     0.2167   1.174    0.240
cohort2000               14.4675  1455.3981   0.010    0.992
cohort2001               16.8188  1455.3978   0.012    0.991
cohort2002               15.9715  1455.3977   0.011    0.991
cohort2003               16.6647  1455.3977   0.011    0.991
cohort2004               17.1194  1455.3976   0.012    0.991
cohort2006               15.2607  1455.3976   0.010    0.992
cohort2007               14.9470  1455.3976   0.010    0.992
cohort2008               15.3837  1455.3976   0.011    0.992
cohort2009               15.1762  1455.3976   0.010    0.992
cohort2010               14.8053  1455.3976   0.010    0.992
colonyother:cohort2000        NA         NA      NA       NA
colonyother:cohort2001        NA         NA      NA       NA
colonyother:cohort2002        NA         NA      NA       NA
colonyother:cohort2003        NA         NA      NA       NA
colonyother:cohort2004   13.7583   550.0887   0.025    0.980
colonyother:cohort2006  -15.5151  1455.3976  -0.011    0.991
colonyother:cohort2007   -0.9163     0.5979  -1.533    0.125
colonyother:cohort2008        NA         NA      NA       NA
colonyother:cohort2009   -0.6912     0.5214  -1.326    0.185
colonyother:cohort2010        NA         NA      NA       NA

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1361.4  on 998  degrees of freedom
Residual deviance: 1267.9  on 983  degrees of freedom
AIC: 1299.9

Number of Fisher Scoring iterations: 14
  /

Any comment/suggestion on this?
Thanks for any help

--
View this message in context: http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list