[R-sig-ME] Duplicating SAS Proc HPMixed results in lme 4 and lmerTest
Douglas Bates
bates at stat.wisc.edu
Wed May 29 17:08:29 CEST 2013
I enclose an R source file and the output from running this code in batch
mode using
R CMD BATCH --vanilla cutsim.R
As Luca mentioned, it is important to convert the categorical variables
Animal, Farm and Species to factors in R before fitting models. In general
it is advisable to check the structure of a data frame, using the str()
function, and a summary of the columns, using summary(), before fitting
models. This is especially important when dealing with large data sets.
One additional check that I did was to check whether Animal is nested
within Species, which one would expect to be true, and also nested within
Farm. By extracting the unique rows of the subset of the data frame formed
by taking only the Animal, Species and Farm columns and looking at the row
count we see that there are 10000 rows which is the number of distinct
Animals. Thus Species and Farm are nested within Animal. (If they weren't
there would be more than 10000 rows.)
The first model that I fit has a fixed-effect for Species and random
effects for Animal and Farm:Species. I know you wanted to model
Farm:Species as a fixed effect but with 500 levels I would generally use a
random effect for that factor. The next model is the one you wanted to
fit. As suggested by Luca, it will take a long time and use a lot of
memory. This is because there are fixed-effects for over 500 coefficients.
The model matrix for the fixed-effects is very sparse (over 99% of the
elements are zeros) but it is being treated as a dense matrix. This is why
fitting the model takes over 10 times as long as the previous model fit.
To test the significance of the Farm:Species interaction I fit the model
without the term and compare it to the model with the term. In this case
the evidence of the significance of the term is overwhelming, although that
could be seen from the first model fit too. The reason I use maximum
likelihood fits for the models is to allow the likelihood ratio test.
More information about the R-sig-mixed-models
mailing list