[R] factors in probit regression
David Winsemius
dwinsemius at comcast.net
Fri Oct 7 14:26:29 CEST 2011
On Oct 7, 2011, at 1:32 AM, Daniel Malter wrote:
> Note that the whole model screams at you that it is wrongly modeled.
> You are
> running a fully interacted model with factor variables. Thus, you
> have 19
> regressors plus the baseline for 150 observations. Note that all your
> coefficients are insignificant with a z-value of 0 and a p-value of
> 1. This
> indicates that something is severely wrong with your model. And it
> is not
> difficult to tell what. If you look at the residual deviance, it is
> effectively zero. This means that you are overfitting the model.
> Your model
> explains fully (with no error), whether the dependent variable is a
> zero or
> a one. This may be meaningful in a descriptive but not in an
> inferential
> sense.
That may be true, but it does not mean that Pablo cannot get
predictions from the model which was what was requested I'm not yet
convinced that nothing can be done with this model. It may serve a
useful purpose as a "saturated model" from which efforts at
simplification might be attempted and from which deviations in the
model and the predictions could be usefully considered.
>
> Also, there are no "Control" coefficients or interactions because
> modeling
> three factor levels only requires two dummy variables. The other one
> becomes
> the omitted baseline that is absorbed in the intercept. That is, the
> intercept and the "plain" interaction terms capture that group.
> Please pick
> up an introductory econometrics book before continue.
>
> Best,
> Daniel
>
>
> garciap wrote:
>>
snipped duplicate output
>>
>>
>> Well, there are too many levels of the original factors lacking in
>> this
>> table. As an example, the factor CE has three levels (Undefined,
>> Control,
>> Experimental), but in the table there are only two of them
>> (NO=undefined,
>> Experimental=Experimental). I need to check the complete result,
>> how can I
>> obtain the effects for the remaining levels of the factors?
The predict function will produce estimates for any actual or
hypothetical case when you supply a newdata argument with a dataframe
that includes the same column names as the RHS of model. In regression
with discrete variables alway one level that needs to be considered as
part of the Intercept. In R that level is chosen as the first factor
level. The Estimate offered for (Intercept) is actully the estimate
for a case with CE, CEBO, and Luz all at their lowest factor level.
Lowest depending on the spelling of their labels. You can make changes
in that assignment. For advice about specific methods to do that in R,
please first read the Posting Guide and include a much more complete
description of the dataset such as produced by str(experimento).
--
David.
>>
>> Thanks,
>>
>> Pablo
>>
> Hi to all of you,
>
> I'm fitting an full factorial probit model from an experiment, and
> I've the
> independent variables as factors. The model is as follows:
>
>
> fit16<-glm(Sube ~ as.factor(CE)*as.factor(CEBO)*as.factor(Luz),
> family=binomial(link="probit"), data=experimento)
>
> but, when I took a look to the results I've obtained the following:
>
> glm(formula = Sube ~ CE * CEBO * Luz, family = binomial(link =
> "probit"),
> data = experimento)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.651e-06 -1.651e-06 1.651e-06 1.651e-06 1.651e-06
>
> Coefficients: (3 not defined because of singularities)
> Estimate Std. Error z value
> Pr(>|z|)
> (Intercept) 6.991e+00 3.699e
> +04 0
> 1
> CEexperimental 5.357e-09 4.775e
> +04 0
> 1
> CENO -1.398e+01 4.320e
> +04 0
> 1
> CEBOcombinado 4.948e-26 4.637e
> +04 0
> 1
> CEBOolor 1.183e-25 4.446e
> +04 0
> 1
> CEBOvisual 7.842e-26 5.650e
> +04 0
> 1
> Luzoscuridad 3.383e-26 4.637e
> +04 0
> 1
> CEexperimental:CEBOcombinado -6.227e-26 6.656e
> +04 0
> 1
> CENO:CEBOcombinado -3.758e-26 5.540e
> +04 0
> 1
> CEexperimental:CEBOolor -2.611e-25 6.865e
> +04 0
> 1
> CENO:CEBOolor -5.252e-26 5.620e
> +04 0
> 1
> CEexperimental:CEBOvisual -2.786e-09 7.700e
> +04 0
> 1
> CENO:CEBOvisual 8.169e-15 6.334e
> +04 0
> 1
> CEexperimental:Luzoscuridad -1.703e-25 6.304e
> +04 0
> 1
> CENO:Luzoscuridad -1.672e-28 6.117e
> +04 0
> 1
> CEBOcombinado:Luzoscuridad 1.028e-26 5.950e
> +04 0
> 1
> CEBOolor:Luzoscuridad 9.212e-27 6.207e
> +04 0
> 1
> CEBOvisual:Luzoscuridad NA NA
> NA
> NA
> CEexperimental:CEBOcombinado:Luzoscuridad 9.783e-26 8.744e
> +04 0
> 1
> CENO:CEBOcombinado:Luzoscuridad -2.948e-26 7.959e
> +04 0
> 1
> CEexperimental:CEBOolor:Luzoscuridad 1.573e-25 9.005e
> +04 0
> 1
> CENO:CEBOolor:Luzoscuridad -2.111e-26 8.208e
> +04 0
> 1
> CEexperimental:CEBOvisual:Luzoscuridad NA NA
> NA
> NA
> CENO:CEBOvisual:Luzoscuridad NA NA
> NA
> NA
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 2.0853e+02 on 150 degrees of freedom
> Residual deviance: 4.1146e-10 on 130 degrees of freedom
> AIC: 42
>
>
> Well, there are too many levels of the original factors lacking in
> this
> table. As an example, the factor CE has three levels (Undefined,
> Control,
> Experimental), but in the table there are only two of them
> (NO=undefined,
> Experimental=Experimental). I need to check the complete result, how
> can I
> obtain the effects for the remaining levels of the factors?
>
> Thanks,
>
> Pablo
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/factors-in-probit-regression-tp3879176p3881041.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list