[R] Path Analysis

Mon May 24 19:15:04 CEST 2010

Dear Sam,

> -----Original Message-----
> From: R Help [mailto:rhelp.stats at gmail.com]
> Sent: May-24-10 1:04 PM
> To: John Fox
> Cc: r-help
> Subject: Re: [R] Path Analysis
> 
> That's an interesting idea, I got the same impression from your SEM
> appendix to "Companion to applied regression" in the paragraph just
> before Section 3.
> 
> So I could get the same results if I built the following two models:

Not really the same results, but the models are similar.

> 
> mod1 =
>
lm(intent~exposure+benefit+norms+childBarrier+parentBarrier+knowBenefit,data
=
> dat)
> mod2 =
>
glm(recuse~intent+norms+exposure+childBarrier+parentBarrier,data=dat,family=
b
> inomial(link=logit))
> 
> And in the second model only the intent should have a significant
> coefficient?

Yes, if you're right that the effects of the other variables are entirely
mediated by intent.

> 
> When I run those models I get a number of significant findings in the
> mod2.  Does that mean that I have mis-specified my model?  If so (and
> I think I have), can I postulate that there is a link between each
> significant coefficient?

With the usual caveats about "significance" and interpreting regressions
causally, large coefficients for the other variables suggests that their
effects are not wholly mediated by intent.

Best,
 John

> 
> Thanks so much for your input,
> Sam Stewart
> 
> 
> > summary(mod2)
> 
> Call:
> glm(formula = recuse ~ intent + norms + exposure + childBarrier +
>     parentBarrier, family = binomial(link = logit), data = dat)
> 
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -2.2784  -0.9018   0.5899   0.7686   1.9314
> 
> Coefficients:
>               Estimate Std. Error z value Pr(>|z|)
> (Intercept)   -2.51269    0.50359  -4.990 6.05e-07 ***
> intent         0.59574    0.08345   7.139 9.39e-13 ***
> norms          0.23822    0.02991   7.964 1.67e-15 ***
> exposure       0.12522    0.08613   1.454 0.145981
> childBarrier  -0.31296    0.08693  -3.600 0.000318 ***
> parentBarrier -0.23400    0.08676  -2.697 0.006995 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 1803.0  on 1479  degrees of freedom
> Residual deviance: 1567.8  on 1474  degrees of freedom
>   (40 observations deleted due to missingness)
> AIC: 1579.8
> 
> Number of Fisher Scoring iterations: 4
> 
> On Mon, May 24, 2010 at 1:17 PM, John Fox <jfox at mcmaster.ca> wrote:
> > Dear sstewart,
> >
> > The model appears to reflect the path diagram, assuming that you intend
to
> > allow the exogenous variables to be correlated and want the errors to be
> > uncorrelated.
> >
> > This is one way to model the binary variable reuse. An alternative would
be
> > to fit the equation for intent by least-squares regression (assuming
that
> > the relationships are linear, etc.), and the equation of reuse by, e.g.,
> > logistic regression (again assuming that the model is correctly
specified).
> > If you're right that the effects of the exogenous variables are entirely
> > mediated by intent, then if you put these variables in the equation for
> > reuse, their coefficients should be small.
> >
> > I hope this helps,
> >  John
> >
> > --------------------------------
> > John Fox
> > Senator William McMaster
> >  Professor of Social Statistics
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
> > On
> >> Behalf Of R Help
> >> Sent: May-24-10 11:18 AM
> >> To: r-help
> >> Subject: [R] Path Analysis
> >>
> >> Hello list,
> >>
> >> I'm trying to make sure that I'm performing a path analysis correctly
> >> using the sem package.  the figure at
> >> http://flame.cs.dal.ca/~sstewart/regressDiag.png has a detailing of
> >> the model.
> >>
> >> The challenge I'm having is that reuse is an indicator (0/1) variable.
> >>
> >> Here's the code I'm using:
> >>
> >> corr =
> >>
> >
>
hetcor(dat[,c('intent','exposure','benefit','norms','childBarrier','parentBa
> > r
> >>
rier','knowBenefit','recuse')],use="pairwise.complete.obs")$correlations
> >> modMat = matrix(c(
> >>   'exposure -> intent', 'gam11',NA,
> >>   'benefit -> intent', 'gam12',NA,
> >>   'norms -> intent', 'gam13',NA,
> >>   'childBarrier -> intent', 'gam14',NA,
> >>   'parentBarrier -> intent', 'gam15',NA,
> >>   'knowBenefit -> intent', 'gam16',NA,
> >>   'intent<->intent','psi11',NA,
> >>   'intent->recuse','gam21',NA,
> >>   'recuse<->recuse','psi22',NA),
> >>   ncol=3,byrow=T)
> >> model4 =
> >>
> >
>
sem(modMat,corr,N=1520,fixed.x=c('exposure','benefit','norms','childBarrier'
> > ,
> >> 'parentBarrier','knowBenefit'))
> >>
> >> Is this correctly modeling my diagram?  I'm not sure if a) I'm dealing
> >> with the categorical variable correctly, or b) whether fixed.x is
> >> accurately modeling the correlations for me.
> >>
> >> Any help would be appreciated.  I'm also looking into creating a plot
> >> function within R (similar to the path.diagram function, but using R
> >> plots).  If I get something useful I'll try and post it back
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >