[R-sig-ME] In II[, ii] + REmat$codes[[i]] :

Ben Bolker bbolker at gmail.com
Fri Jan 13 23:37:52 CET 2012


  [cc'd back to r-sig-mixed-models again: I strongly prefer to have
these conversations on the record so they can be archived and so that
others can benefit from them]

On 12-01-13 05:13 PM, Giulia Dotti Sani wrote:
> Thank you for the suggestions,
> 
> *Indeed getting rid of NA's solved that problem.
> 
> * I actually have 37 groups, but it didn't seem useful to send you a
> gigantic dataset for this purpose. I apologize if this was misleading.
> 
> *I'm trying out glmmPQL for overdispersion at the moment to see if
> works faster (and it seems to)

  I'm sure glmmPQL is faster than glmmADMB for these problems, but I
repeat that there are at least some options (the primary one being
adding an observation-level random effect) for handling overdispersion
in glmer.  Furthermore, in general the Laplace approximation (glmer's
default algorithm is more accurate than PQL, and AGQ (an option with
glmer) is more accurate still, although it may not matter much in your
case (I would definitely do at least a brief comparison between the
results of the different methods on a test data set to see how much they
differ).

> *I'm using a Poisson with non integers beacause the dependent var is
> the ratio of time in a / time in b. It necessarily has values between
> 0 and 1, but also greater than 1. time-use data literature suggests to
> use the poisson because it wouldn't make sense to have negative
> estimates.

  I won't say this makes *no* sense, but I claim it's not necessarily
sensible.  If you're really analyzing ratios then beta regression has a
better theoretical foundation (it is supported in glmmADMB, although not
in glmer), although it's not without its practical problems.  Gamma
regression (supported in both glmmADMB and glmer, although somewhat
finicky in glmer) also assumes a non-negative response, although it
assumes a continuous response (which seems more sensible).  Log-normal
(i.e. simply transform the data) is also a possibility.
  Looking at the marginal distribution of your response variable, it
seems to have values up to 57?  Are these on a percentage scale?

  I wonder if the zeros are a different category or simply represent
censoring/lack of resolution ...

ggplot(ddat,aes(x=age,y=0.001+ave_chi_hh,colour=country_y))+stat_sum(alpha=0.3)+geom_smooth()+facet_wrap(~educ)+
  scale_y_log10()+theme_bw()


> 
> 
> On Fri, Jan 13, 2012 at 7:44 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>  [cc'ing back to r-sig-mixed-models]
>>
>>  A few things:
>>
>> * the basic problem is that you have NA values in your data: these get
>> removed automatically in one place in the code and not in the other,
>> hence the length mismatch.  This is not absolutely trivial to fix
>> automatically -- the fixed and random effect predictors are handled
>> separately, and there may be extra variables in the data that should be
>> disregarded completely -- but in the meantime I have at least put in an
>> informative warning message in this case (for the next release).  The
>> simplest solution is to use na.omit() to get rid of these values (which
>> can't be used in the fit anyway).
>>
>>  * it's not advisable to fit a random effect to a factor with only three
>> levels, although in this case it seems not to do anything disastrous
>> (ADMB does issue one warning, although in this case it seems harmless)
>>
>>  * for this problem glmer works *much* faster than glmmADMB (glmmADMB
>> used to work faster, but we made it slower in the process of adapting it
>> to be more general and flexible) -- about 2 seconds vs. 2 minutes on my
>> computer.  quasi-likelihood is unreliable (and no longer possible) in
>> glmer, but you should check http://glmm.wikidot.com/faq for other
>> alternatives for handling overdispersion [and for more on the previous
>> point about numbers of levels of random effects] (although it is still
>> true that glmmADMB allows a wider range of options than glmer)
>>
>>  * the data set you sent didn't have a 'chi_hh' variable in it, only an
>> 'ave_chi_hh' variable -- I used it instead for the fitting. *However*,
>> ave_chi_hh is not integer-valued.  Unless you're absolutely sure you
>> know what you're doing, you shouldn't use a Poisson GLMM to fit
>> non-integer data.  (I'm adding a test and a warning for this too.)
>>
>> library(glmmADMB)
>> ## best not to call data 'data', this masks a built-in R function
>> ddat <- read.csv("dottisani_data.csv")
>> summary(ddat)
>> ddat <- na.omit(ddat)
>>
>> library(lme4)
>> t1 <- system.time(g1 <- glmer(ave_chi_hh ~ age + educ +(1 |country_y),
>> data=ddat,
>>         family="poisson"))
>>
>> t2 <- system.time(g2 <- glmmadmb(ave_chi_hh ~ age + educ +(1
>> |country_y), data=ddat,
>>         family="poisson"))
>>
>>
>> On 12-01-13 01:08 PM, Giulia Dotti Sani wrote:
>>> Hello and thanks for the answer
>>> I'm attaching a subset of the data I'm using.  I was using lmer, but I
>>> moved to glmm to tackle overdispersion, since I've been reading that
>>> the quasipoisson families for lmer are not reliable.
>>>
>>> thank you
>>> Giulia
>>>
>>>
>>>>  sessionInfo()
>>> R version 2.14.1 (2011-12-22)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252
>>> [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
>>> [5] LC_TIME=Italian_Italy.1252
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>>  [1] glmmADMB_0.7.2   R2admb_0.7.5     car_2.0-11       survival_2.36-10
>>>  [5] nnet_7.3-1       foreign_0.8-48   arm_1.4-14       abind_1.4-0
>>>  [9] R2WinBUGS_2.1-18 coda_0.14-6      MASS_7.3-16      lmtest_0.9-29
>>> [13] zoo_1.7-6        lme4_0.999375-42 Matrix_1.0-2     lattice_0.20-0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] grid_2.14.1   nlme_3.1-102  stats4_2.14.1 tools_2.14.1
>>>
>>>
>>> On Fri, Jan 13, 2012 at 6:35 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>>> Giulia Dotti Sani <giulia.dottisani at ...> writes:
>>>>
>>>>> I'm running the following poisson and I keep getting the same error.
>>>>> M01 <- glmmadmb(chi_hh ~ age_m + educ +(1 |country_y), data=data,
>>>>> family="poisson")
>>>>>
>>>>> In II[, ii] + REmat$codes[[i]] :
>>>>>   longer object length is not a multiple of shorter object length
>>>>>
>>>>> I think it has to do with the grouping variable but I don't see what's
>>>>> the problem.
>>>>
>>>>  Not reproducible ... post the data somewhere or send them to me?
>>>>  Results of sessionInfo() please (i.e. what version of glmmADMB
>>>> are you using?)
>>>>  For what it's worth, for this problem you could also use glmer,
>>>> which might be faster.  glmmADMB really comes into its own for
>>>> extended models (zero-inflated or truncated, negative binomial, Beta, etc.) that
>>>> glmer can't handle.
>>>>
>>>>  Ben Bolker
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>




More information about the R-sig-mixed-models mailing list