[R] Highly significant intercept and large standard error

Wed Oct 6 17:07:15 CEST 2010

Ah, okay, binomial. Then it seems that the few responses you have within MagNew are all or mostly no sells (assuming no sell = 0 and sell = 1). At least, that would explain the really low coefficient for that level.

When I said that maybe the data have been entered incorrectly, I meant the response variable (not the Mag factor). But since you are dealing with a binomial response, this doesn't seem to be the issue here anyway.

Now, regarding the AIC and why it would be lower if you include the Mag factor. I do not think the large SE itself has anything to do with it. Looking at the results you showed, it doesn't seem as if the Mag factor itself is a good predictor of sell/no sell. Another hunch: maybe there are suppression effects going on, so other predictors in the model are able to do a better job of predicting sell/no sell when the Mag factor is included.

Best,

--
Wolfgang Viechtbauer                        http://www.wvbauer.com/
Department of Methodology and Statistics    Tel: +31 (0)43 388-2277
School for Public Health and Primary Care   Office Location:
Maastricht University, P.O. Box 616         Room B2.01 (second floor)
6200 MD Maastricht, The Netherlands         Debyeplein 1 (Randwyck)

----Original Message----
From: Chris Mcowen [mailto:sam_smith at me.com]
Sent: Wednesday, October 06, 2010 16:38
To: Viechtbauer Wolfgang (STAT)
Cc: r-help at r-project.org
Subject: Re: [R] Highly significant intercept and large standard error

> Hi Wolfgang,
>
> Thanks for this, it makes sense.
>
> I should of been more detailed when i described my model, it is in
> fact binomial - sell or not.
>
>> remove the Mag factor from the model, you get a model with just an
>> intercept, reflecting the overall mean
>
> This is true, but what i was trying to say ( not very well!) was i
> have other factors such as price (High,Mid,Low), condition (
> Best,Average,Poor) etc etc and all models that have Mag in them have
> a much better AIC than models without Mag, and i was unsure if this
> was a artefact of the high SE for the MagNew rather than Mag being a
> key factor?
>
>> Maybe the data have been entered incorrectly
>
> I have checked this and all is fine, they are categorical variables
> not continuous so it is either MAG - New, Old or Mid.
>
> Sam
>
>
>
> On 6 Oct 2010, at 15:05, Viechtbauer Wolfgang (STAT) wrote:
>
> I do not know about the details of the model, but the results are not
> all that strange. I'll assume that you are using family=gaussian(),
> so you are essentially running a model where (Intercept) reflects the
> mean of the dependent variable for that third category (MagMid) of
> the Mag factor and MagNew and MagOld are the mean differences between
> MagMid and those two other categories.
>
> If you remove the Mag factor from the model, you get a model with
> just an intercept, reflecting the overall mean. Two things will
> happen. That overall mean is essentially a weighted average of the
> three level-specific means. MagMid and MagOld are the most frequent
> categories and both these means are close to zero, so the overall
> mean will be pulled close to zero. Moreover, the amount of
> variability around the overall mean will be larger than the amount of
> variability around the level-specific means. This will lead to a
> larger standard error for the overall mean. Hence, it could very well
> happen that the intercept is no longer significant when you remove
> that factor.
>
> Given that MagNew only occured a few times and given its very
> different mean and huge standard error, I suspect that some value(s)
> within that level are "screwy". Maybe the data have been entered
> incorrectly. One thing I have seen happen a few times is that missing
> data were coded, for example, as a -9999 in the dataset created with,
> for example, SPSS, but were then accidentally treated as observed
> values when analyzed with some other software, such as R. That could
> cause such a low mean for that category and the huge SE.
>
> It's just a hunch. Could be anything, but I would certainly take
> another good look at the values within that level.
>
> Best,
>
>
>> Dear list,
>>
>> I am running a lmer model and have a question.
>>
>> When ever i put a factor (Mag) in my model it lowers the AIC of the
>> model, however the intercept is the only value with  significant
>> p-value. I have looked at the coefficients and the standard error
>> and something jumps out at me.
>>
>>
>>                              Estimate Std. Error z value Pr(>|z|)
>> (Intercept)            -1.35778    0.30917  -4.392 1.12e-05 ***
>> MagNew           -15.76939 1255.06372  -0.013    0.990
>> MagOld            0.14250    0.25246   0.564    0.572
>>
>> MagNew relates to a categorical factor (Mag) that has 3 levels of
>> which New is one and Old is another ( The third is not displayed).
>>
>> It appears MagNew has a huge Std.Error, what could cause this?
>>
>> When i do str(Mag) you will see that New is relatively rare (29 out
>> of 871) i presume it is this that is raising the Std.Error value.
>> however i am not sure why this is causing the  intercept to have a
>> highly significant p value . Furthermore how do i interpret it, I am
>> using AIC values as my basis of model selection and i am unsure if
>> this really is the most likely model or not?
>>
>> Thanks
>>
>> Sam