[R] Highly significant intercept and large standard error
Daniel Nordlund
djnordlund at frontier.com
Wed Oct 6 17:16:59 CEST 2010
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Chris Mcowen
> Sent: Wednesday, October 06, 2010 7:38 AM
> To: Viechtbauer Wolfgang (STAT)
> Cc: r-help at r-project.org
> Subject: Re: [R] Highly significant intercept and large standard error
>
> Hi Wolfgang,
>
> Thanks for this, it makes sense.
>
> I should of been more detailed when i described my model, it is in fact
> binomial - sell or not.
>
> > remove the Mag factor from the model, you get a model with just an
> intercept, reflecting the overall mean
>
> This is true, but what i was trying to say ( not very well!) was i have
> other factors such as price (High,Mid,Low), condition ( Best,Average,Poor)
> etc etc and all models that have Mag in them have a much better AIC than
> models without Mag, and i was unsure if this was a artefact of the high SE
> for the MagNew rather than Mag being a key factor?
>
> > Maybe the data have been entered incorrectly
>
> I have checked this and all is fine, they are categorical variables not
> continuous so it is either MAG - New, Old or Mid.
>
> Sam
>
>
>
> On 6 Oct 2010, at 15:05, Viechtbauer Wolfgang (STAT) wrote:
>
> I do not know about the details of the model, but the results are not all
> that strange. I'll assume that you are using family=gaussian(), so you are
> essentially running a model where (Intercept) reflects the mean of the
> dependent variable for that third category (MagMid) of the Mag factor and
> MagNew and MagOld are the mean differences between MagMid and those two
> other categories.
>
> If you remove the Mag factor from the model, you get a model with just an
> intercept, reflecting the overall mean. Two things will happen. That
> overall mean is essentially a weighted average of the three level-specific
> means. MagMid and MagOld are the most frequent categories and both these
> means are close to zero, so the overall mean will be pulled close to zero.
> Moreover, the amount of variability around the overall mean will be larger
> than the amount of variability around the level-specific means. This will
> lead to a larger standard error for the overall mean. Hence, it could very
> well happen that the intercept is no longer significant when you remove
> that factor.
>
> Given that MagNew only occured a few times and given its very different
> mean and huge standard error, I suspect that some value(s) within that
> level are "screwy". Maybe the data have been entered incorrectly. One
> thing I have seen happen a few times is that missing data were coded, for
> example, as a -9999 in the dataset created with, for example, SPSS, but
> were then accidentally treated as observed values when analyzed with some
> other software, such as R. That could cause such a low mean for that
> category and the huge SE.
>
> It's just a hunch. Could be anything, but I would certainly take another
> good look at the values within that level.
>
> Best,
>
> --
> Wolfgang Viechtbauer http://www.wvbauer.com/
> Department of Methodology and Statistics Tel: +31 (0)43 388-2277
> School for Public Health and Primary Care Office Location:
> Maastricht University, P.O. Box 616 Room B2.01 (second floor)
> 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)
>
>
> ----Original Message----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Sam Sent: Wednesday,
> October 06, 2010 14:03 To: r-help at r-project.org
> Subject: [R] Highly significant intercept and large standard error
>
> > Dear list,
> >
> > I am running a lmer model and have a question.
> >
> > When ever i put a factor (Mag) in my model it lowers the AIC of the
> > model, however the intercept is the only value with significant
> > p-value. I have looked at the coefficients and the standard error and
> > something jumps out at me.
> >
> >
> > Estimate Std. Error z value Pr(>|z|)
> > (Intercept) -1.35778 0.30917 -4.392 1.12e-05 ***
> > MagNew -15.76939 1255.06372 -0.013 0.990
> > MagOld 0.14250 0.25246 0.564 0.572
> >
> > MagNew relates to a categorical factor (Mag) that has 3 levels of
> > which New is one and Old is another ( The third is not displayed).
> >
> > It appears MagNew has a huge Std.Error, what could cause this?
> >
> > When i do str(Mag) you will see that New is relatively rare (29 out
> > of 871) i presume it is this that is raising the Std.Error value.
> > however i am not sure why this is causing the intercept to have a
> > highly significant p value . Furthermore how do i interpret it, I am
> > using AIC values as my basis of model selection and i am unsure if
> > this really is the most likely model or not?
> >
> > Thanks
> >
> > Sam
> >
> > [1] Old Old Old Old Old Old Old Old Old Old Old
> > [12] Old Old Old Old Old Old Old Old Old Old Old
> > [23] Old Old Old Mid Old Old Old Mid Old Old Old
> > [34] Old Old Old Old Mid Old Old Old Old Old Old
> > [45] Mid Mid Mid Old Old Old Mid Mid Mid
> > Mid Old [56] Old Old Old Old Old Old Old Old Old Old Old
> > [67] Old Old Old Old Old Old Old Old Old Old Old
> > [78] Old Old Old Old Old Old Old Old Old Old Old
> > [89] Old Old Old Old Old Old Old Old Old Old Old
> > [100] Old Old Old Old Old Old Old Old Old New New
> > [111] Old Old Old Old Old Old Old Old Old Old Mid
> > [122] Mid Mid Mid Mid Old Old Old Old Mid Mid
> > Mid [133] Mid Mid Mid Mid Mid Mid Mid Mid
> > Mid Mid Mid [144] Mid Mid Mid Mid Old Old
> > Old Mid Mid Mid Mid [155] Mid Mid Mid Mid
> > Mid Mid Mid Old Old Old Old [166] Old Old Old Mid
> > Mid Mid Mid Mid Mid Mid Mid [177] Mid Mid
> > Mid Mid Mid Mid Mid Mid Mid Old Mid
> > [188] Mid Mid Mid Mid Old Mid Mid Mid
> > Mid Mid Mid [199] Mid Mid Old Old Old Old Old
> > Old Old Old Old [210] Old Old Old Old Old Old Old Old Old
> > Old Old [221] Old Old Old Old Old Old Old Old Old Old Old
> > [232] Old Old Old Old Old Old Old Old Old Old Old [243] Old
> > Old Old Old Old Old Old Old Old Old Old [254] Old Old Old
> > Old Old Old Old Old Old Old Old [265] Old Old Old Old Old
> > Old Old Old Old Old Old [276] Old Old Old Old Old Old Old
> > Old Old Old Old [287] Old Old Old Old Old Old Old Old Old
> > Old Old [298] Old Old Old Old Old Old Old Old Old Old Old
> > [309] Old Old Old Old Old Old Old Old Old Old Old
> > [320] Old Old Old Old Old Old Old Old Old Old Old
> > [331] Old Old Old Old Old Old Old Old Old Old Mid
> > [342] Old Old Old Old Old Old Old New New New New
> > [353] New New New New New Old Old Old Old Old Old
> > [364] Old New Old Old Old Old Old Old Old Old Old
> > [375] Old Old Old Old Old Old Old Old Old Old Old
> > [386] Old Old Old Old Old Old Old Old Mid Mid Mid
> > [397] Mid Mid Mid Old Old Mid Old Old Mid Mid
> > Mid [408] Mid Mid Mid Mid Mid Mid Mid Mid
> > Mid Mid Mid [419] Old Old Old Old Mid Mid Mid
> > Mid Mid Old Mid [430] Mid Mid Mid Mid Mid
> > Mid Mid Mid Mid Mid Mid [441] Mid Mid Mid
> > Mid Mid Mid Old Old Old Old Old [452] Old Old Old
> > Old Old Old Old Mid Mid Old Old [463] Mid Mid
> > Old Old Mid Mid Mid Mid Mid Old Mid [474] Mid
> > Mid Old Mid Old Old Old Old Old Old Old [485] Mid
> > Mid Mid Mid Mid Mid Mid Mid Mid Mid
> > Old [496] Old Old Old Old Old Old Mid Old Mid Old Old
> > [507] Old Old Old Old Old Old Old Old Old Old Old [518] Mid
> > Mid Mid Mid Old Mid Old Mid Old Mid Mid
> > [529] Old Old Mid Mid Mid Mid Mid Mid Old
> > Mid Mid [540] Mid Mid Mid Mid Mid Mid Old
> > Old Old Old Mid [551] Mid Mid Old Old Mid Mid
> > Old Mid Old Old Old [562] Old Mid Old Old Old Mid
> > Old Old Old Old Mid [573] Mid Mid Old Old Mid Mid
> > Mid Mid Old Old Old [584] Mid Old Old Old Old Old
> > Old Mid Mid Mid Old [595] Mid Mid Mid Old
> > Old New Mid Mid Old Mid Mid [606] Mid Old Mid
> > Old Old Mid Mid Mid Mid Mid Old [617] Mid
> > Old Old Old Old Old Old Old Old Old Old [628] Old Old Mid
> > Old Old Old Old Old Old Old Old [639] Old Old Old Old Old
> > Old Old Old Old Old New [650] Old Mid Old Old Old Old
> > Old Old Old Old Old [661] Old Old Old Old Old Old Old Old
> > Old Old Old [672] Old Old New Old Old Old Old Old Old Old
> > Old [683] New Old Old Old Old Old Old Old Old Old Old [694]
> > Old Old Old Old Old Old Old Old Old Old Old [705] Old Old
> > Old New Old Old New Old Old Old Old [716] New New New New New
> > Old Old Old New Old Old [727] Old Old Old Old Old Old Mid
> > Old Old Old New [738] Old Old Old Old Old Old Old Old Old
> > Old Old [749] Old Old Old Old Old Old Old Old Old Old Old
> > [760] New Old Old Old Old Old Old Old Old Old New
> > [771] Old Old Old Old Old Old Mid Old Old New Old
> > [782] Old Old Old Old Old Old Old Old Old Old Old
> > [793] Old Old Old Old Old Old Old Old Old Old Old
> > [804] Old Old Old Old Old Old Old Old Old Old Old
> > [815] Old Old Old Old Old Old Old Old Old Old Old
> > [826] Old Old Old Old Old Old Old Old Old Old Old
> > [837] Old Old Old Old Old Old Old Old Old Old Old
> > [848] Old Old Old Old Old Old Old Old Old Old Old
> > [859] Old Old Old Old Old Old Mid Mid Old Old Old
> > [870] Old Old
> > Levels: Mid New Old
> >
The fact that MagNew has such a large coefficient and large SE suggests that your model exhibits what some refer to as "complete separation" or "quasi-complete separation" in the data and there is no maximum likelihood estimate for the coefficient. What does a cross-tabulation of Mag with your DV look like? You might want to read up on quasi-complete separation and suggestions for dealing with that.
Hope this is helpful,
Dan
Daniel Nordlund
Bothell, WA USA
More information about the R-help
mailing list