[R] appropriate covariance matrix for multiple nominal exogenous and multiple continuous endogenous variables in SEM

John Fox jfox at mcmaster.ca
Thu May 29 22:09:08 CEST 2008


Dear Gus,

If the nominal variables are exogenous then you need not use polychoric and
polyserial correlations. (Indeed, if they are polytomous and unordered, then
it would be inappropriate to do so.) Simply use dummy exogenous variables,
as you would in a dummy regression. 

You could use a one-sided formula, along with model.matrix() and cov() to
compute the input covariance matrix, as in the following example, using the
Prestige data frame in the car package:

> S <- cov(model.matrix(~ type + income + education, data=Prestige))[-1,-1]
> S
               typeprof        typewc        income    education
typeprof     0.21849358   -0.07500526     1157.0972 1.051153e+00
typewc      -0.07500526    0.18146434     -447.3270 5.373869e-02
income    1157.09720177 -447.32695140 17877104.7629 6.672664e+03
education    1.05115296    0.05373869     6672.6640 7.556652e+00

(Note the [-1,-1] to get rid of the row/column for the invariant constant
regressor; alternatively, you could put -1 in the model formula, but then
you'd get 3 dummy regressors for the three categories of type.)

An alternative would be to fit a model with a constant, in which case you
could use raw.moments() to compute the moment matrix, as in:

> S <- raw.moments(model.matrix(~ type + income + education, data=Prestige))
> S

Raw Moments
             (Intercept)     typeprof       typewc       income    education
(Intercept)    1.0000000    0.3163265    0.2346939     6938.857    10.795102
typeprof       0.3163265    0.3163265    0.0000000     3340.235     4.455204
typewc         0.2346939    0.0000000    0.2346939     1185.745     2.586735
income      6938.8571429 3340.2346939 1185.7448980 65842423.776 81510.246531
education     10.7951020    4.4552041    2.5867347    81510.247   124.013771

N =  98

If by "path analysis" you mean a recursive model, then, since there are no
latent variables, you might as well use lm() to fit the model equation by
equation. If the model is nonrecursive, you could use tsls() in the sem
package to fit the model equation by equation, or one of estimators in the
systemfit package. 

Finally, "numeric" means quantitative, which may or may not literally be
continuous. A nominal variable is represented by a factor in R, but remember
that the functions in polycor require that the levels be ordered.

I hope this helps,
 John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Gus Jespersen
> Sent: May-29-08 1:02 PM
> To: r-help at r-project.org
> Subject: [R] appropriate covariance matrix for multiple nominal exogenous
and
> multiple continuous endogenous variables in SEM
> 
> Hi,
> I would like to use the sem package to perform a path analysis (no
> latent variables) with a mixture of 2 nominal exogenous, 1 continuous
> exogenous, and 4 continuous endogenous variables.  I seek advice as to
> how to calculate the appropriate covariance matrix for use with the sem
> package.
> 
> I have read through the polycor package, and am confused as to the use
> of "numeric" for the hetcor function.  Is this used synonymous with a
> continuous variable, or perhaps a nominal variable?
> 
> Thanks,
> Gus
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list