[R-sig-ME] Rasch with lme4

Tue Jun 9 06:38:40 CEST 2009

On Tue, 9 Jun 2009, Reinhold Kliegl wrote:

>
> On 09.06.2009, at 00:09, David Duffy wrote:
>
>> On Mon, 8 Jun 2009, Reinhold Kliegl wrote:
>> 
>>> Conditional modes (generated from the model parameters and the data) are 
>>> not independent observations. Therefore, only the second method is valid.
>>> 
>>>> myModel <- lmer(y~1+(1|item)+(1|subject),data=mydata, family=binomial);
>>>> intelligence <- ranef(myModel)$subject[[1]];
>>>> lm(intelligence~extraversion);

versus

>>>> myModel2 <- lmer(y~1+(1|item)+(1|subject)+extraversion,data=mydata,
>>>> family=binomial);

>> Most people would prefer something like the first model, and in fact would
>> estimate the correlation between IQ and E estimated (as if without error) 
>> from two measurement models given by the scoring rules for the instruments 
>> (these are essentially BLUPs).  Incorporating measurement error for both 
>> measures is the truest way to do it.
>> 
> Always interested to be corrected...
>
> My comment was referring to the use of conditional modes (formerly known as 
> BLUPs)--extracted with ranef()--not bootstrap or SEM. Not sure to what degree 
> these alternatives correspond to the use of conditional modes.  As far as 
> conditional modes are concerned, we (Kliegl, Masson, & Richter)  have a paper 
> in press (available at my publication page), in which we write the following:
>
> To this end, we 
> generated 100,000 sets of data for a simple LMM model including 30 ?subjects? 
> and a predictor with 10 levels, conforming to a known variance for intercept 
> and slope across subjects and varying the true correlation between these 
> parameters from -0.9 to +0.9 in 2,000 steps (i.e., each simulation used a 
> different correlation). ...
> ... [C]onditional means underestimate variances and exaggerate covariances 
> and correlations. The shrinkage of variance reflects the contribution of the 
> likelihood in the computation of conditional means. Shrinkage correction for 
> predictions leads to dampening of the variance components, but, as we have 
> shown in this section, not of the associated covariance component. The 
> shrinkage of variance prevents overfitting of unreliable data but, as a 
> curious side effect, the "correlations" based on conditional means for 
> individual subjects are larger in absolute value than the corresponding LMM 
> estimates of the correlation.
>

Well I'm not sure how much that reflects the specific model you have 
simulated, and I don't have the time right now to do simulations based on 
the original poster's setup.  (I have experienced your problems with 
shrinkage etc in a simple minded attempt to carry out genetic linkage 
analysis of breeding values (BLUPs) estimated from the same pedigree). 
And I would concur with your postscript.  And I will study your paper with 
great interest.

However, the impression I have is that usually the effects of just 
plugging in the factor scores when they are based on, say, 20 or 30 
individual items with a straightforward structure are not too misleading, 
and are just what people have been doing for the last 50 years. I am 
currently comparing results from a two-stage mixed model analysis of BLUPs 
from an IRT (carried out in BUGS) analysis of multiple ordinal measures 
adjusting for multiple fixed covariates to results of analyses I am performing on the 
original variables.  I have not seen any major inconsistencies, but I will 
look for effects of the type you have described.

Colleagues have examined the multitrait mixed model analysis of pedigree 
data using the full analysis and compared it to using BLUPs:

Dorret I. Boomsma and Conor V. Dolan (1998).  A Comparison of Power to 
Detect a QTL in Sib-Pair Data Using Multivariate Phenotypes, Mean 
Phenotypes, and Factor Scores. Behavior Genetics 28: 329-340

They found in their simulations that there was "negligible overestimation" 
of the genetic covariance in models where they split the sample in two, 
using one-half to generate the prediction model, which was applied to the 
other half to generate the BLUPs, and then vice-versa.

My specific comment was based on an impression that that the second model 
with Extraversion as a fixed effect doesn't give the original poster what 
he is interested in, viz an assessment of the relationship between two 
(imperfectly measured) psychological traits: IQ and E.

I find it a bit confusing, but the test of the single regression 
coefficient for Extraversion in the second model seems to me to be 
different from the that in the first.  Specifically, the E->IQ->item(1..N) 
model constrains the pattern of expected covariation between E and any 
one IQ item differently from the fixed effects model.

Finally, multiple imputation type methods are one way people get around 
these types of problems, as a full maximum likelihood analysis is often 
expensive computationally (with a multimodal likelihood surface). I don't 
think a simple bootstrap resampling repeatedly calling lmer and raneff 
would get around the biases you have noted.

Cheers, David Duffy

PS The OP may be aware of work of my colleagues on this particular topic:
http://genepi.qimr.edu.au/contents/p/staff/CV516.pdf

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v