[R-sig-ME] mixed effects model glmer

Ben Bolker bbolker at gmail.com
Thu Sep 24 05:07:26 CEST 2015


[I'm taking the liberty of cc'ing r-sig-mixed-models on this; please keep the
mailing list in the cc: list when responding]

 the main problem, which should be very easy to fix, is simply that
your predictor variable accidentally got turned into a factor!

  Try

data1 <- transform(data1,
   disease.prevalence=as.numeric(as.character(disease.prevalence)))

  Additionally:

  there's nothing necessarily wrong with means/year.  Take a second look
at things after you have un-mangled your data ...

On Wed, Sep 23, 2015 at 6:26 PM, M West <westm490 at gmail.com> wrote:
> Hi Dr. Bolker,
>
> Thanks for getting back to me.
>
>  Just to be clear, disease prevalence is a number in [0,1]?
>
> Not currently, it's just percentage infected out of total (e.g., 20/400).
> I can change it if I need to....
>
>
> Four years is not very many, so you might need to treat Year as a
> fixed effect (e.g. I would consider that option if the random effects
> variance
> is estimated as zero).
>
>
>
>  How many sites?  How many total observations?
> 15 sites total each year.
>
> Can do. We sample every two weeks for ~3-4 months per year. So, I collapsed
> the data down to
> means/year....maybe that's no good either.
>
>
> I have to admit that I'm stumped by your apparent model output (i.e.
> that there are multiple parameters for disease prevalence when there
> should only be one)
>
>   Perhaps you could send the results of summary(data1) and/or
> str(data1) and summary() of your whole model?
>
> Sure, thy are all down below...I can also send you the data and code if
> that's easier.
>
> summary(data1)
>
>     Site        Site.1            Year           Max_P         variable_X
> disease prevalence   Average.of...male Average.of.Total.Female
> Average.of.total.all
>            :57   Min.   : 1              :56              :57
> :57              :57                  :74      Min.   : 2.00         Min.
> :106.0
>  Asite   : 6   1st Qu.: 4   2009       :15   0          : 2   1.260095428: 2
> 0          : 3       0.633898824: 1      1st Qu.: 8.75         1st Qu.:295.8
>  Bsite: 6   Median : 8   2010       :15   0.912379678: 2   10.90217665: 2
> 14.50737238: 2       0.770941362: 1      Median :21.50         Median :414.5
>  BCsite  : 6   Mean   : 8   2011       :15   1.841225901: 2   14.44718559: 2
> 2.089865262: 2       0.853637644: 1      Mean   :24.10         Mean   :400.8
>  Csite: 6   3rd Qu.:12   2014       :15   1.956818222: 2   18.02232207: 2
> 2.208160622: 2       1.008902322: 1      3rd Qu.:30.25         3rd Qu.:513.8
>  Dsite  : 6   Max.   :15   11.20859993: 2   10.69047265: 2   18.5213911 : 2
> 2.643993383: 2       1.00895193 : 1      Max.   :89.00         Max.   :648.0
>  (Other)   :62   NA's   :89   (Other)    :31   (Other)    :82
>
>
>
> summary(moda)
>
>
>
> Generalized linear mixed model fit by maximum likelihood (Laplace
> Approximation) ['glmerMod']
>  Family: binomial  ( logit )
> Formula: y ~ disease.prevalence + (1 | Lake) + (1 | Year)
>    Data: males1
>
>      AIC      BIC   logLik deviance df.resid
>    419.3    545.0   -149.7    299.3        0
>
> Scaled residuals:
>    Min     1Q Median     3Q    Max
> -3.470  0.000  0.000  0.000  1.989
>
> Random effects:
>  Groups Name        Variance Std.Dev.
>  Site   (Intercept) 0        0
>  Year   (Intercept) 0        0
> Number of obs: 60, groups:  Lake, 15; Year, 4
>
> Fixed effects:
>                               Estimate Std. Error z value Pr(>|z|)
> (Intercept)                   -1.60267    0.11618 -13.794  < 2e-16 ***
> disease.prevalence0           -0.40212    0.15557  -2.585 0.009745 **
> disease.prevalence0.035088231 -1.46452    0.22860  -6.406 1.49e-10 ***
> disease.prevalence0.064935065 -0.36344    0.30810  -1.180 0.238157
> disease.prevalence0.078507945 -2.57479    0.46537  -5.533 3.15e-08 ***
> disease.prevalence0.120039255 -3.30998    0.71915  -4.603 4.17e-06 ***
>
> this goes on forever and then there's this:
>
> Correlation matrix not shown by default, as p = 58 > 20.
> Use print(...., correlation=TRUE)  or
> vcov(....) if you need it
>
>
>
>
> str(data1)
> 'data.frame': 149 obs. of  12 variables:
>  $ Site                     : Factor w/ 18 levels "","Alake","BDlake",..: 2
> 3 4 5 6 7 8 9 10 12 ...
>  $ Site1                    : int  1 2 3 4 5 6 7 8 9 10 ...
>  $ Year                      : Factor w/ 22 levels "","11.20859993",..: 7 7
> 7 7 7 7 7 7 7 7 ...
>  $ disease.prevalence        : Factor w/ 75 levels "","0","0.035088231",..:
> 8 35 5 69 6 56 11 51 50 26 ...
>  $ Average.of...male         : Factor w/ 76 levels "","0.633898824",..: 14
> 32 31 61 5 45 11 62 68 60 ...
>  $ Average.of.Total.Female     : int  7 14 5 31 2 22 6 21 14 24 ...
>  $ Average.of.total.all     : int  405 553 331 512 274 648 338 388 230 454
> ....
>  $ Change.in.percent.females   : num  1.581 2.466 2.535 5.183 0.843 ...
>
>
>
>
> On Wed, Sep 23, 2015 at 4:41 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>
>> On Wed, Sep 23, 2015 at 2:36 PM, M West <westm490 at gmail.com> wrote:
>> > I am trying to fit a mixed effects model with repeated measures data.
>> >
>> > Data are:
>> >
>> > y variable = percentage (# females/total)
>> > x variable = percentage
>> >
>> > measured across multiple sites for 4 years.
>> >
>> > here's the model:
>> >
>> > y <- cbind(total females, (Total - total females)))
>> >
>> > mod1 <- with(data, glmer(y ~ disease prevalence +  (1|Site) + (1|Year),
>> > family = binomial,  data = data1))
>>
>>   Just to be clear, disease prevalence is a number in [0,1]?
>> >
>> > 1) This model runs, but the summary(mod1) just generates a series of the
>> > following....which doesn't make any sense so something must be wrong
>> > with
>> > the model specification...I'm just not sure what.
>> >
>> > 2) Also, what is the default AR correlation on these models (i.e., do I
>> > need to specify it or is the temporal psuedoreplication taken care of)?
>>
>>   AR models are not currently easy in lme4.  My suggestion (=hack) would
>> be to get the residuals and use nlme::gls(resid~1,correlation=corAR1())
>> (or
>> something like that) to see if you should worry about autoregressive
>> structure.
>>
>>   Four years is not very many, so you might need to treat Year as a
>> fixed effect (e.g. I would consider that option if the random effects
>> variance
>> is estimated as zero)
>>
>>   How many sites?  How many total observations?
>>
>>   I have to admit that I'm stumped by your apparent model output (i.e.
>> that there are multiple parameters for disease prevalence when there
>> should only be one)
>>
>>   Perhaps you could send the results of summary(data1) and/or
>> str(data1) and summary() of your whole model?
>>
>> >
>> > 3) Finally, do you suggest another form of the model that's better etc.?
>> >
>> > Fixed effects:
>> >                                 Estimate             Std. Error    z
>> > value
>> >    Pr(>|z|)
>> > (Intercept)                  -1.60267            0.11618    -13.794    <
>> > 2e-16 ***
>> > disease prevalence    -0.40212           0.15557     -2.585     0.009745
>> > **
>> > disease prevalence    0.035088231    -1.46452    0.22860  -6.406
>> > 1.49e-10
>> > ***
>> > disease prevalence    0.064935065     -0.36344   0.30810  -1.180
>> > 0.238157
>> >
>> > disease prevalence    0.078507945    -2.57479    0.46537  -5.533
>> > 3.15e-08
>> > ***
>> > disease prevalence    0.120039255    -3.30998    0.71915  -4.603
>> > 4.17e-06
>> > ***
>> > disease prevalence    0.182623706     -0.14362   0.19899  -0.722
>> > 0.470438
>> >
>> >
>
>



More information about the R-sig-mixed-models mailing list