[R] Latent class analysis, selection of the number of classes

Christian Hennig chrish at stats.ucl.ac.uk
Tue May 24 16:48:58 CEST 2011


Just a quick, slightly critical remark about this: the BIC and the CAIC 
are *by definition* pretty much the same, so it shouldn't be interpreted 
as some kind of additional confirmation if they point to the same number 
of classes, it's rather (more or less) "the same information twice".

Christian

On Tue, 24 May 2011, David Joubert wrote:

>
> I have used PoLCA for this purpose, and not the e1071 package.
> You should use a variety of fit indices to choose the number of classes. The BIC may not always be the best choice, depending on your sample size and frequency table. In the best case, AIC, CAIC and BIC values agree as to the optimal number of classes. The Cressie-Read statistic is useful with sparse tables, but I havent found a way to obtain it in R. If you're a coder there might be a way to write a function to obtain it.
>
> With poLCA, you can quickly call output for a range a latent classes, and evaluate solutions. The commands are very simple.
>
> David Joubert
>
>
>> Date: Tue, 24 May 2011 12:30:01 +0100
>> From: chrish at stats.ucl.ac.uk
>> To: daniel at umd.edu
>> CC: r-help at r-project.org
>> Subject: Re: [R] Latent class analysis, selection of the number of classes
>>
>> Dear Daniel,
>>
>> the BIC can be used to estimate the number of classes. This is actually
>> given out by lca, so you could run lca with several different k and pick
>> the solution that gives you the best BIC.
>> Unfortunately I can't tell you whether "large is good" or "small is good"
>> for the BIC implementation of lca, because there are both versions found
>> in the literature, BIC with positive and negative sign. (I think that if
>> there is any standard, then it should rather be "large is good"; you
>> certainly can check it looking up the values of the loglikelihood and a
>> definition of the BIC in a book. "Large is good" if the likelihood is
>> used in the definition with a positive sign.)
>>
>> With a bit of experimentation it should be able to find out which way
>> round it is, or you may ask the e1071-maintainer.
>>
>> Hope this helps (actually I may have missed if somebody responded before),
>> Christian
>>
>> On Mon, 23 May 2011, Daniel Malter wrote:
>>
>>> Hi,
>>>
>>> I perform latent class analysis on a matrix of dichotomous variables to
>>> create an indicator of class/category membership for each observation. I
>>> would like to know whether there is a function that selects the best fit in
>>> terms of number of classes/categories.
>>>
>>> Currently, I am doing this with the lca() function of the e1071 package.
>>> This function requires me to specify the number of classes and to compare
>>> fit statistics for each run of lca. This becomes somewhat cumbersome the
>>> more variables the data matrix contains and, thus, the greater the number of
>>> possible classes is. I was wondering whether there is an alternative
>>> implemented in a different package that does exactly that.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> --
>>> View this message in context: http://r.789695.n4.nabble.com/Latent-class-analysis-selection-of-the-number-of-classes-tp3545538p3545538.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> *** --- ***
>> Christian Hennig
>> University College London, Department of Statistical Science
>> Gower St., London WC1E 6BT, phone +44 207 679 1698
>> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list