[R-sig-ME] Population weights in a Poisson model with overdispersion
Thierry Onkelinx
thierry.onkelinx at inbo.be
Wed May 25 09:54:54 CEST 2016
Dear Theodore,
Given that the number of patients with a disease is bounded between 0 and
the total number of patients, a binomial distribution seems to be more
reasonable. That would probably handle the overdispersion as well.
Design based weights are not straightforward in generalised linear models.
You could consider calculating an weighted average based on the model
parameters.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2016-05-24 21:56 GMT+02:00 Theodore Lytras <thlytras op gmail.com>:
> Dear all,
>
> I am having some trouble with the correct specification of a Poisson model
> (with lme4), particularly with respect to weights.
>
> Each observation is a rate from a single physician: number of patients with
> ILI (influenza-like illness) / total patients. There are no fixed
> covariates;
> I am only interested in estimating an average rate for all physicians,
> with an
> intercept-only Poisson model. Observations are clustered within geographic
> areas (8 in total), and have a fair amount of overdispersion, which I am
> handling with an observation-level random effect.
>
> Thus for one geographic area only (say "U1"), the appropriate model would
> be:
>
> m.U1 <- glmer( gritot ~ 1 + (1|codeiat), offset=log(totvis),
> data=datU1, family="poisson")
>
> Where: gritot = number of patients with ILI, per physician
> totvis = total patients per physician
> codeiat = physician ID (observation ID), as an R factor
>
> Now, for all geographic areas, I would guess the model should be:
>
> m.all <- glmer( gritot ~ 1 + (1|codeiat) + (1|area), offset=log(totvis),
> data=dat, family="poisson")
>
> Where: area = area code, also as an R factor
>
> Each codeiat only occurs within a given area, so if I understand correctly
> my
> grouping factors are nested (codeiat nested within area). Note that I am
> also
> interested in estimating the average rate for each area.
>
>
> Question 1: Unfortunately in the above model, the variance for area is very
> very small compared to the variance for codeiat. From the output of
> summary(m.all):
>
> Random effects:
> Groups Name Variance Std.Dev.
> codeiat (Intercept) 7.690e-01 8.769e-01
> area (Intercept) 1.557e-10 1.248e-05
> Number of obs: 108, groups: codeiat, 108; area, 8
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.16856 0.09294 -23.33 <2e-16 ***
>
> Is there some way (i.e. some different random-effects specification) to
> prevent the observation-level random effect from "eating away" all the
> variance?
> Background theory suggests there is a single average rate (the fixed
> intercept), around which the rate per area varies, and there is a further
> variance per physician (codeiat) around the area-specific rate. Assigning
> all
> the variance to the physician and none to the geographic area makes no
> theoretical or practical sense.
>
> For example, if I drop the random effect per physician (i.e. the
> observation
> level random effect):
>
> m2 <- glmer( gritot ~ 1 + (1|area), offset=log(totvis),
> data=dat, family="poisson")
>
> the output becomes:
>
> Random effects:
> Groups Name Variance Std.Dev.
> area (Intercept) 0.05958 0.2441
> Number of obs: 108, groups: area, 8
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.95867 0.09237 -21.2 <2e-16 ***
>
> This makes sense, but then how would I take account of the overdispersion
> **within** each geographic area?? Shouldn't I still need to specify an
> observation-level random effect somehow?? (One that does not "eat away" all
> the variance.)
>
>
> Question 2: The 8 geographic areas have different population sizes, and I
> would like my fixed intercept (the average rate across the areas) to
> reflect
> this as well, i.e. observations from areas with big populations should
> weigh
> more than observations from areas with smaller populations. How would I
> assign
> weights to each observation, in order to get a population-weighted average
> rate as the fixed intercept of the model (also taking account of the
> overdispersion problem above)?? I am really really stuck on this one.
>
> My initial thought would be to fit separate models for each area (with
> observation-level random-effect) and then combine the log rates (i.e. the
> fixed intercepts) in some second-level model (maybe Poisson with identity
> link??). But I have no idea how to combine the standard error of each rate
> and
> its population-specific weight (as offset? as weight?). And intuition
> tells me
> this should be possible to do within a single model.
>
>
> I really appreciate any help!
>
> Thank you so much,
>
> Theodore Lytras
>
> Epidemiologist, Department of Epidemiologic Surveillance,
> Hellenic Centre of Disease Control and Prevention, Athens, Greece
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list