Jarrod Hadfield j.hadfield at ed.ac.uk
Fri Jan 29 03:11:24 CET 2010

Dear Doug,

Perhaps I misunderstand Rubin's missing data theory, and/or perhaps  
its not relevant to Thierry's problem.

I was under the impression that if the probability of missingness  
depends on the value observed for some other data (MAR), then by  
including this data and structuring the likelihood correctly then  
correct inferences (i.e. in the absence of missingness) could be made.  
Given that the default na.action of lmer seems to deletes other data  
(complete case analysis), it is hard to see how the other data can be  
used to 'correct' for missingness. MCMCglmm uses augmentation for  
missing data. Internally, this is often used just to simplify/speed up  
the matrix operations using dummy data.  However, I had presumed that  
if users really did have MAR data then the augmentation would take  
care of this. I know ASReml has an na.includeY argument so presumably  
there is something to be gained by not reducing the problem to a  
complete-case analysis, but perhaps this function is there just to  
allow users to make predictions for missing data points. I know the  
asreml team read this list, so perhaps they could comment?



Quoting Douglas Bates <bates at stat.wisc.edu>:

> On Thu, Jan 28, 2010 at 6:57 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>> Dear Thierry,
>> I THINK the fixed effect slope should be what you're after if you want to
>> predict the change in log numbers, but simply exponentiating the prediction
>> will not give you a true measure of the arithmetic increase.
> I too think that the fixed-effect slope should be an estimate of the
> population slope on the log(count) scale, except for the usual
> problems with counts of zero and, in this case, the (1|Year) random
> effects term.  I can appreciate that you may want to incorporate year
> to year variability due to weather conditions in the model but I'm not
> sure what the effect of that on the fixed effect for Year would be.  I
> could imagine an argument for them not interfering with each other
> (the fixed effect is measuring the trend and the random effect
> measures year-to-year variability around the trend line) but I am not
> confident of that argument.
>> The arithmetic prediction for years 1:10 (for example) when the slope.
>> variance for the year|room term is zero would be:
>> exp(b_year*1:10+0.5*(v1+v2))
>> where b_year is your slope estimate, and v1 is the year intercept variance
>> and v2 is the room intercept variance.
>> When slope variance exists this becomes more difficult, because it implies
>> the variance v2 changes as a function of year. In this case:
>> v2=diag(Z%*%V2%*%t(Z))
>> where
>> Z<-cbind(rep(1,10), 1:10)
>> and V2 is the covariance matrix of the room intercept-slopes.
>> Or if you like
>> v2 = V2[1,1]+(1:10)*V2[1,2]*2+(1:10^2)*V2[2,2]
>> Another difficulty is the possibility that your missing data are not
>> "completely missing at random". By default lmer just seems to omit missing
>> data rather than dealing with it properly, but perhaps there is an argument
>> that can be passed to na.omit which suppresses this?
> I'm not sure what you mean by "dealing with it properly".  Are you
> considering some form of imputation?
> My general approach is that, because the methods in lmer allow for
> unbalanced data, there would not be a purpose in imputing counts that
> were not observed.  I presume that when Number is observed the Year
> and Room are also recorded (otherwise you should get rid of some of
> the members of your field crew).  The only benefit that I could
> imagine for imputing cases that were not observed would be if the
> computational methods required balanced data.
> Perhaps I am misunderstanding what you are getting at here, Jarrod.
>>  If so, then the less
>> strict assumption of "missing at random" can be made. In this latter case
>> the missing data only have to be random conditional on the observed data -
>> for example, if there were no bats in room A in year 1 which made the field
>> workers less inclined to visit room A in year 2 based on their knowledge of
>> the 1'st year's count.
>> Cheers,
>> Jarrod
>> Quoting "ONKELINX, Thierry" <Thierry.ONKELINX at inbo.be>:
>>> Dear all,
>>> We are modelling the total numbers of hibarnating bats in a fortress. We
>>> have data of the number of bats per room spanning ten years. The main
>>> problem is that not all rooms were visited each year. The fieldworkers
>>> did not known or find all rooms and some rooms were not allways
>>> accessible.
>>> Some of the rooms were not counted in the early years and they contain a
>>> rather high number of bats in the more recent years. So a glm on the
>>> total observed number would be very biased. Therefore we would use a
>>> mixed model on the numbers of bats per room. The model looks like:
>>> glmer(Number ~ Year + (1|Year) + (Year|Room), family = poisson). Year is
>>> the long-term trend. (1|Year) allows for year-to-year variability (due
>>> to weatherconditions) and (Year|Room) allows for a random intercept and
>>> slope per room.
>>> Our main question about this model is the interpretation of the
>>> long-term trend (fixed effect of Year). Given the model specification it
>>> is the trend in an 'average' room from the population of rooms. Can we
>>> assume that this trend equals the trend in the total number of bats in
>>> the fortress. That would be the trend in to total observed numbers if we
>>> could have investigated every room in every year.
>>> Or is it better to use the model to simulate the total number of bats
>>> and then model this simulated totals using a simple glm? Repeating the
>>> simulations a large number of times would yield an average and
>>> confidence intervals for the trend.
>>> Best regards,
>>> Thierry
ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek
>>> team Biometrie & Kwaliteitszorg
>>> Gaverstraat 4
>>> 9500 Geraardsbergen
>>> Belgium
>>> Research Institute for Nature and Forest
>>> team Biometrics & Quality Assurance
>>> Gaverstraat 4
>>> 9500 Geraardsbergen
>>> Belgium
>>> tel. + 32 54/436 185
>>> Thierry.Onkelinx at inbo.be
>>> www.inbo.be
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to
>>> say what the experiment died of.
>>> ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data.
>>> ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> ~ John Tukey
>>> Druk dit bericht a.u.b. niet onnodig af.
>>> Please do not print this message unnecessarily.
>>> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
>>> weer
>>> en binden het INBO onder geen enkel beding, zolang dit bericht niet
>>>  bevestigd is
>>> door een geldig ondertekend document. The views expressed in  this message
>>> and any annex are purely those of the writer and may not be regarded  as
>>> stating
>>> an official position of INBO, as long as the message is not  confirmed by
>>> a duly
>>> signed document.
