[R] r code for multilevel latent class analysis
Cristina Cametti
cristina.cametti at gmail.com
Fri Jul 8 22:57:11 CEST 2016
Dear all,
thank you very much for your suggestions! About the fact that I put my variables plus 1, it is because if I dont do it, I get this message:
ALERT: some manifest variables contain values that are not
positive integers. For poLCA to run, please recode categorical
outcome variables to increment from 1 to the maximum number of
outcome categories for each variable.
I read about the fact that poLCA need integers to run (in the various explanations on the web). So, in the end I used this code:
lca = poLCA(cbind(ppltrst=ppltrst+1,pplfair=pplfair+1,pplhlp=pplhlp+1) ~ cntry,
maxiter=50000, nclass=3,
nrep=10, data=mydata)
It is working, I mean this is the output that I obtained:
Model 1: llik = -244070.8 ... best llik = -244070.8
Model 2: llik = -241832.9 ... best llik = -241832.9
Model 3: llik = -245111.9 ... best llik = -241832.9
Model 4: llik = -242490.5 ... best llik = -241832.9
Model 5: llik = -240447.7 ... best llik = -240447.7
Model 6: llik = -250882.1 ... best llik = -240447.7
Model 7: llik = -240447.7 ... best llik = -240447.7
Model 8: llik = -242547.3 ... best llik = -240447.7
Model 9: llik = -240447.7 ... best llik = -240447.7
Model 10: llik = -247340.3 ... best llik = -240447.7
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$ppltrst
Pr(1) Pr(2) Pr(3) Pr(4) Pr(5) Pr(6) Pr(7) Pr(8) Pr(9) Pr(10) Pr(11)
class 1: 0.0071 0.0047 0.0057 0.0112 0.0163 0.1146 0.0892 0.2794 0.3215 0.0897 0.0605
class 2: 0.2375 0.1703 0.2117 0.1681 0.0564 0.1000 0.0098 0.0097 0.0150 0.0078 0.0137
class 3: 0.0189 0.0104 0.0494 0.1442 0.1687 0.3268 0.1499 0.1094 0.0216 0.0008 0.0000
$pplfair
Pr(1) Pr(2) Pr(3) Pr(4) Pr(5) Pr(6) Pr(7) Pr(8) Pr(9) Pr(10) Pr(11)
class 1: 0.0025 0.0021 0.0054 0.0062 0.0062 0.0512 0.0670 0.2721 0.3697 0.1394 0.0782
class 2: 0.1712 0.1285 0.2048 0.1654 0.0667 0.1586 0.0151 0.0133 0.0261 0.0143 0.0361
class 3: 0.0003 0.0011 0.0186 0.0952 0.1428 0.3413 0.1756 0.1584 0.0577 0.0068 0.0023
$pplhlp
Pr(1) Pr(2) Pr(3) Pr(4) Pr(5) Pr(6) Pr(7) Pr(8) Pr(9) Pr(10) Pr(11)
class 1: 0.0046 0.0051 0.0139 0.0369 0.0495 0.1804 0.1434 0.2374 0.2127 0.0720 0.0442
class 2: 0.2218 0.1893 0.2334 0.1412 0.0471 0.1044 0.0081 0.0098 0.0139 0.0107 0.0205
class 3: 0.0074 0.0159 0.0755 0.1779 0.1731 0.2870 0.1226 0.0984 0.0380 0.0024 0.0018
Estimated class population shares
0.3351 0.2014 0.4635
Predicted class memberships (by modal posterior prob.)
0.3323 0.187 0.4807
=========================================================
Fit for 3 latent classes:
=========================================================
2 / 1
Coefficient Std. error t value Pr(>|t|)
(Intercept) -0.67015 0.07488 -8.950 0.000
cntryBE 0.39006 0.11038 3.534 0.000
cntryCH -1.39925 0.14273 -9.804 0.000
cntryCZ 1.26804 0.12295 10.314 0.000
cntryDE 0.14062 0.10330 1.361 0.174
cntryDK -2.90300 0.22895 -12.680 0.000
cntryES 0.67484 0.11334 5.954 0.000
cntryFI -2.47880 0.18096 -13.698 0.000
cntryFR 0.73428 0.12339 5.951 0.000
cntryGB -0.28234 0.11663 -2.421 0.016
cntryGR 2.69529 0.11605 23.225 0.000
cntryHU 1.71068 0.12213 14.006 0.000
cntryIE -0.65851 0.10951 -6.013 0.000
cntryIT 1.49171 0.13915 10.720 0.000
cntryLU 0.35923 0.11635 3.088 0.002
cntryNL -1.17781 0.12193 -9.659 0.000
cntryNO -2.83265 0.19959 -14.192 0.000
cntryPL 2.76831 0.14608 18.950 0.000
cntryPT 1.65176 0.13278 12.440 0.000
cntrySE -1.93606 0.15250 -12.696 0.000
cntrySI 1.52657 0.11667 13.084 0.000
=========================================================
3 / 1
Coefficient Std. error t value Pr(>|t|)
(Intercept) 0.40117 0.05997 6.689 0.000
cntryBE 0.37591 0.09215 4.079 0.000
cntryCH -0.27800 0.08100 -3.432 0.001
cntryCZ 0.75694 0.11287 6.707 0.000
cntryDE 0.47418 0.08072 5.874 0.000
cntryDK -1.92545 0.10423 -18.473 0.000
cntryES 0.55324 0.09713 5.696 0.000
cntryFI -1.27608 0.08553 -14.920 0.000
cntryFR 0.74221 0.10457 7.098 0.000
cntryGB 0.27381 0.08555 3.201 0.001
cntryGR 1.06585 0.11440 9.317 0.000
cntryHU 1.05480 0.11392 9.259 0.000
cntryIE -0.61390 0.08243 -7.448 0.000
cntryIT 1.12519 0.12983 8.667 0.000
cntryLU 0.23379 0.09523 2.455 0.014
cntryNL -0.35230 0.07981 -4.414 0.000
cntryNO -1.54373 0.08836 -17.471 0.000
cntryPL 1.68826 0.14280 11.823 0.000
cntryPT 1.26210 0.12225 10.324 0.000
cntrySE -1.09443 0.08383 -13.055 0.000
cntrySI 0.68829 0.11137 6.180 0.000
=========================================================
number of observations: 39254
number of estimated parameters: 132
residual degrees of freedom: 1198
maximum log-likelihood: -240447.7
AIC(3): 481159.3
BIC(3): 482291.6
X^2(3): 40937.31 (Chi-square goodness of fit)
ALERT: estimation algorithm automatically restarted with new initial values
Warning message:
In sqrt(diag(VCE.beta)) : Si è prodotto un NaN
Except for the last part, I think that this output makes sense.
However, if you have additional suggestions, I will be very glad to hear them.
Thank you very much again!
Cristina
Il giorno 08/lug/2016, alle ore 01:49, David Winsemius <dwinsemius at comcast.net> ha scritto:
>>
>> On Jul 7, 2016, at 3:36 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
>>
>> Hi Cristina,
>> Try this:
>>
>> names(mydata)
>>
>> It may be NULL or "ppitrst" may be absent.
>
> I've already suggested to Christina that she make sure the variables are spelled correctly and she reports they are all present in her dataset. So I tried a formula such as she posed with '1' added to each variable and this does throw the same error with the 'values'-dataframe that is used in the examples for that package.
>
>> data(values,package='poLCA')
>> str(values)
> 'data.frame': 216 obs. of 4 variables:
> $ A: num 2 2 2 2 2 2 2 2 2 2 ...
> $ B: num 2 2 2 2 2 2 2 2 2 2 ...
> $ C: num 2 2 2 2 2 2 2 2 2 2 ...
> $ D: num 2 2 2 2 2 2 2 2 2 2 ...
>> library(poLCA)
> Loading required package: scatterplot3d
> Loading required package: MASS
>> poLCA( cbind(A+1,B+1) ~ C, data=values)
> Error in `[.data.frame`(data, , match(colnames(y), colnames(data))[j]) :
> undefined columns selected
>
> So I then tried removeing those "+1`"'s (which didn't seem to have much justification):
>
>> poLCA( cbind(A,B) ~ C, data=values)
> Conditional item response (column) probabilities,
> by outcome variable, for each class (row)
>
> $A
> Pr(1) Pr(2)
> class 1: 0.3428 0.6572
> class 2: 0.0307 0.9693
>
> $B
> Pr(1) Pr(2)
> class 1: 0.7737 0.2263
> class 2: 0.1386 0.8614
>
> snipped the rest of the output.
>
> So "why add 1?" Seems to disturb the functions formula processing logic and is so far not explained.
>
> --
> David.
>
>>
>> Jim
>>
>>
>> On Thu, Jul 7, 2016 at 8:26 PM, Cristina Cametti
>> <cristina.cametti at gmail.com> wrote:
>>> Dear all,
>>>
>>> I am not able to find a reliable r code to run a multilevel latent class model. Indeed, I have to analyze how social trust (three variables form the ESS survey) might vary between countries (21 countries in my database). I tried to use the poLCA package but I am not sure if my code is right. This is my code:
>>> lca <- cbind(ppltrst+1,pplfair+1,pplhlp+1)~cntry
>>> lc <- poLCA(lca,mydata)
>>>
>>> However, I get an error message:
>>> Error in `[.data.frame`(data, , match(colnames(y), colnames(data))[j]) :
>>> undefined columns selected
>>>
>>> How can I solve this? Is the code completely wrong or I missed some passages?
>>> Thank you very much for your help!
>>>
>>> Cristina
>>> [[alternative HTML version deleted]]
>>>
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list