[R-sig-ME] Rasch with lme4

Tue Jun 9 11:28:10 CEST 2009

On Tue, Jun 9, 2009 at 6:38 AM, David Duffy<David.Duffy at qimr.edu.au> wrote:
> On Tue, 9 Jun 2009, Reinhold Kliegl wrote:
>
>>
>> On 09.06.2009, at 00:09, David Duffy wrote:
>>
>>> On Mon, 8 Jun 2009, Reinhold Kliegl wrote:
>>>
>>>> Conditional modes (generated from the model parameters and the data) are
>>>> not independent observations. Therefore, only the second method is valid.
>>>>
>>>>> myModel <- lmer(y~1+(1|item)+(1|subject),data=mydata, family=binomial);
>>>>> intelligence <- ranef(myModel)$subject[[1]];
>>>>> lm(intelligence~extraversion);
>
> versus
>
>>>>> myModel2 <- lmer(y~1+(1|item)+(1|subject)+extraversion,data=mydata,
>>>>> family=binomial);
>
>>> Most people would prefer something like the first model, and in fact
>>> would
>>> estimate the correlation between IQ and E estimated (as if without error)
>>> from two measurement models given by the scoring rules for the instruments
>>> (these are essentially BLUPs).  Incorporating measurement error for both
>>> measures is the truest way to do it.
>>>
>> Always interested to be corrected...
>>
>> My comment was referring to the use of conditional modes (formerly known
>> as BLUPs)--extracted with ranef()--not bootstrap or SEM. Not sure to what
>> degree these alternatives correspond to the use of conditional modes.  As
>> far as conditional modes are concerned, we (Kliegl, Masson, & Richter)  have
>> a paper in press (available at my publication page), in which we write the
>> following:
>>
>> To this end, we generated 100,000 sets of data for a simple LMM model
>> including 30 ?subjects? and a predictor with 10 levels, conforming to a
>> known variance for intercept and slope across subjects and varying the true
>> correlation between these parameters from -0.9 to +0.9 in 2,000 steps (i.e.,
>> each simulation used a different correlation). ...
>> ... [C]onditional means underestimate variances and exaggerate covariances
>> and correlations. The shrinkage of variance reflects the contribution of the
>> likelihood in the computation of conditional means. Shrinkage correction for
>> predictions leads to dampening of the variance components, but, as we have
>> shown in this section, not of the associated covariance component. The
>> shrinkage of variance prevents overfitting of unreliable data but, as a
>> curious side effect, the "correlations" based on conditional means for
>> individual subjects are larger in absolute value than the corresponding LMM
>> estimates of the correlation.
>>
>
> Well I'm not sure how much that reflects the specific model you have
> simulated, and I don't have the time right now to do simulations based on
> the original poster's setup.  (I have experienced your problems with
> shrinkage etc in a simple minded attempt to carry out genetic linkage
> analysis of breeding values (BLUPs) estimated from the same pedigree). And I
> would concur with your postscript.  And I will study your paper with great
> interest.
>
> However, the impression I have is that usually the effects of just plugging
> in the factor scores when they are based on, say, 20 or 30 individual items
> with a straightforward structure are not too misleading, and are just what
> people have been doing for the last 50 years. I am currently comparing
> results from a two-stage mixed model analysis of BLUPs from an IRT (carried
> out in BUGS) analysis of multiple ordinal measures adjusting for multiple
> fixed covariates to results of analyses I am performing on the original
> variables.  I have not seen any major inconsistencies, but I will look for
> effects of the type you have described.
>
> Colleagues have examined the multitrait mixed model analysis of pedigree
> data using the full analysis and compared it to using BLUPs:
>
> Dorret I. Boomsma and Conor V. Dolan (1998).  A Comparison of Power to
> Detect a QTL in Sib-Pair Data Using Multivariate Phenotypes, Mean
> Phenotypes, and Factor Scores. Behavior Genetics 28: 329-340
>
> They found in their simulations that there was "negligible overestimation"
> of the genetic covariance in models where they split the sample in two,
> using one-half to generate the prediction model, which was applied to the
> other half to generate the BLUPs, and then vice-versa.
>
> My specific comment was based on an impression that that the second model
> with Extraversion as a fixed effect doesn't give the original poster what he
> is interested in, viz an assessment of the relationship between two
> (imperfectly measured) psychological traits: IQ and E.
>
> I find it a bit confusing, but the test of the single regression coefficient
> for Extraversion in the second model seems to me to be different from the
> that in the first.  Specifically, the E->IQ->item(1..N) model constrains the
> pattern of expected covariation between E and any one IQ item differently
> from the fixed effects model.
>
> Finally, multiple imputation type methods are one way people get around
> these types of problems, as a full maximum likelihood analysis is often
> expensive computationally (with a multimodal likelihood surface). I don't
> think a simple bootstrap resampling repeatedly calling lmer and raneff would
> get around the biases you have noted.
>
> Cheers, David Duffy
>
> PS The OP may be aware of work of my colleagues on this particular topic:
> http://genepi.qimr.edu.au/contents/p/staff/CV516.pdf
>
> --
> | David Duffy (MBBS PhD)                                         ,-_|\
> | email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
> | Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
> | 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v
>

You are correct that the models are not identical. I assume that the
problem of variance dampening of conditional modes will generalize to
the model under consideration, but one should simulate the specific
model--always a good idea until an analytic answer is available.

I also assumed the Jereon Oom's interest was in demonstrating a
relation between IQ and extroversion, irrespective of the specific
model. If the interest is in the specific correlation, there may
actually be an alternative. The sabreR package allows to estimate the
correlation between up to three dependent variables at the "subject"
level while at the same time allowing for the specification of a
"causal" path between them at the "observation" level (i.e., a
multivariate generalized linear mixed model). I think the
"correlation" model (i.e., m3 below) would be specified as follows:

attach(mydata)
m1 <- sabre(y ~ 1, case=subject, first.link="probit")
m1

m2 <- sabre(extroversion ~ 1, case=subject, first.family="gaussian")
m2

m3 <- sabre(y ~ 1 + extroversion,
                     extroversion ~ 1,
                     case=subject, first.family="binomial",
second.family="gaussian")
m3
detach()

m3 estimates (among other variance components)  a correlation between
the intercepts for y and extroversion across subjects.

The limitation is that sabreR does not allow the specification of
crossed random factors (i.e., of subjects and items). In this respect,
I agree with Harold Dolan on a different branch on this thread. Also
with sabreR you can only estimate the variance of the intercept of the
random effects, not the variance of the fixed effects (or their
associated covariances).  I should also say that I have only checked
out some of the examples in this package. So my experience is very
limited.

Reinhold Kliegl