[R-sig-ME] Question concerning the reduction of random effects

Sat Feb 4 16:38:07 CET 2017

Hello everybody,

I have a question that might perhaps sound weird, but I haven't found anything on this yet, so maybe someone can enlighten me here.

I am working with a BGLMM (random intercept) that contains the random effect "noun" (basically, a noun occurring in a natural language sentence from a sample) with 1.302 levels. The high number of levels is due to the fact that the sample contains this many different nouns in the relevant position of the clause. 

After determining the variances of the individual 1.302 nouns, there are only 54 nouns left which do not contain 0 in the 95 % confidence interval. So, these are nouns that are actually interesting. I have a hunch now that this reduced number of nouns also influences some of the slopes, but I cannot test this. The dataset contains only 2.284 observations. Thus, the total number of random effects is larger than the number of observations when tested against a binary feature. 

I would like to find a way to reduce the random effects to the ones which have shown relevant in the random intercept model, and would like to use the reduced set for a random slope model. It strikes me that this is tampering with the data, unless there is a principled way of selecting a subset from the set of random effects. So, if there is a principled way, I would appreciate learning about it. 

With kind regards 

Tibor