[R-sig-ME] Is it ok to use lmer() for an ordered categorical (5 levels) response variable?

Pierce, Steven p|erce@1 @end|ng |rom m@u@edu
Thu Mar 7 21:47:21 CET 2019


I'm familiar with IRT methods as well as CFA and agree IRT also provides a good measurement approach here. Raykov & Marcoulides (2016) point out that in some situations (possibly including this one), one can compute the relevant IRT parameters from the CFA results and the CFA parameters from the IRT results. My discussions with Raykov lead me to believe that CFA and IRT models are completely interchangeable in those scenarios. They accomplish the same thing. That relationship has not been proven to generalize to all situations (to my knowledge), but it is worth noting when it applies to a given problem. 

Raykov, T., & Marcoulides, G. A. (2016). On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76(2), 325-338. doi:10.1177/0013164415576958


-----Original Message-----
From: landon hurley <ljrhurley using gmail.com> 
Sent: Thursday, March 7, 2019 1:58 PM
To: r-sig-mixed-models using r-project.org
Subject: Re: [R-sig-ME] Is it ok to use lmer() for an ordered categorical (5 levels) response variable?


Since you ask:

On 3/7/19 8:55 AM, Pierce, Steven wrote:
> Neither you nor Harold have (a) made a principled argument about how 
> using CFA and SEM would be flawed, or (b) suggested a better 
> approach. Instead you've criticized a minor point where I 
> acknowledged a similarity between how the scores Nicolas had 
> described were constructed and a commonly-used but flawed approach
> to measurement in the social sciences. I only mentioned that
> similarity as a bridge to suggesting the CFA approach that more
> statistically rigorous social scientists have develoto tped for
> translating binary item-level data into decent measures of
> theoretical constructs.

The problem with traditional factor analytic models with respect to the
theta score estimation lies in that the amount of information changes
relative to each persons' location on the theta scale. This has negative
impacts as a consequence of the so-called "factor score indeterminacy."
 An item of average difficulty is most probable to be located near that
location on the scale, which is a reflection of the theta score the item
was calibrated upon (the item regularity parameters are trained).
Further, let us assume that there is a more complicated relationship
than merely a proportional equivalence between the sum score and the
theta score. If this were false, than at best we would have ordination
(i.e., theta rankings) which were equivalent to the ranks of the sum
scores. This rank equivalence is lost once the model is expanded, but it
allows for differing amounts of information contained in each item to be
allocated across an entire ability level. The discrimination parameter
of an item serves to reflect the slope of the change from one response
to the next. A perfect step function (for example, a Heaviside function)
has perfect discrimination: an ability below the threshold (the
difficulty parameter, where there is a 50% probability of endorsement)
will always respond 0, otherwise 1.

Expansion of the factor model for handling discrete items under item
response theory, would be the typical solution for determining a theta
location for any given respondent. CFA does not represent a solution to
this problem because it specifically avoids the question of the
operation of the production of factor scores themselves (by
integrating/averaging out the individual scores). Instead, it is looking
for the structure of the model, but not how or where it is a useful tool
in the sample data. IRT, on the other hand, enables us to assess how
reliably (how accurately) an estimated location is upon the sample. More
items, and higher discriminating items provide more information, and
serve to change the test information.

If scoring items were the desired approach, then IRT has been developed
for much of the existing standardised testing (e.g., SAT, ACT, GRE,
LSAT, MCAT). It would appear to be the ideal solution.

Violence is the last refuge of the incompetent.

More information about the R-sig-mixed-models mailing list