[R] factor() in lm
Bert Gunter
gunter.berton at gene.com
Sun Dec 1 19:27:54 CET 2013
You may wish to talk to a local statistician or read up on linear
models, as you appear to not understand some basics. Anyway, either
1. You have other covariates in your model that you haven't shown and
your model is overdetermined.
2. You have NA's in your data that causes 1) to occur.
As an example of the above:
x <- rep(letters[1:3],e=5)
y <- factor(rep(1:3,c(5,8,2)))
summary(lm(rnorm(15)~x+y))
Call:
lm(formula = rnorm(15) ~ x + y)
Residuals:
Min 1Q Median 3Q Max
-1.6768 -0.3865 -0.1108 0.3090 1.9632
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04138 0.47160 0.088 0.932
xb 1.59259 1.17111 1.360 0.201
xc 0.36822 0.88228 0.417 0.684
y2 -1.58517 0.96264 -1.647 0.128
y3 NA NA NA NA
Incidentally, I was surprised to find in R3.0.2 that if some levels of
a factor are missing either due to NA's in the response or otherwise,
R estimates the coefficients for the remaining factor levels quite
nicely. I expected it to complain, but it did not. Maybe it has always
been so nicely behaved -- I don't fit overdetermined models and take
care that my factor levels are actually present, so don't run into
trouble. But if this is newish behavior and you are using an oldish
version, you might try upgrading to the current version. Or (more
likely) both clauses of this conditional are false and should be
ignored, and I should preemptively apologize for my foolishness.
Cheers,
Bert
On Sun, Dec 1, 2013 at 9:48 AM, Gary Dong <pdxgary163 at gmail.com> wrote:
> Dear R users,
>
> I am running a linear regression in R. My observations are Census Tracts in
> several metropolitan areas (MSAs). In my data set, each MSA has at least 50
> observations. I use factor(msa_code) in the lm formula to control for
> metropolitan fixed effects. But I kept getting something like this:
>
> .....
> factor(msa_code)12420 4.910e-01 1.517e-01 3.237 0.001221 **
> factor(msa_code)12580 1.966e-01 6.861e-02 2.865 0.004194 **
> factor(msa_code)14460 -3.892e-02 1.653e-02 -2.355 0.018601 *
> factor(msa_code)16980 -2.873e-01 3.278e-02 -8.764 < 2e-16 ***
> factor(msa_code)17140 1.088e-01 6.771e-02 1.607 0.108127
> factor(msa_code)17460 -1.173e-01 4.380e-02 -2.678 0.007441 **
> factor(msa_code)19100 1.368e-01 5.550e-02 2.465 0.013753 *
> factor(msa_code)19740 5.819e-01 1.173e-01 4.962 7.33e-07 ***
> factor(msa_code)19820 -4.214e-01 6.641e-02 -6.346 2.51e-10 ***
> factor(msa_code)26420 1.258e-01 7.541e-02 1.668 0.095486 .
> factor(msa_code)28140 2.010e-01 3.847e-02 5.224 1.85e-07 ***
> factor(msa_code)29820 7.102e-02 6.593e-02 1.077 0.281435
> factor(msa_code)31100 -4.832e-01 1.088e-01 -4.440 9.28e-06 ***
> factor(msa_code)33100 -2.534e-01 6.391e-02 -3.965 7.49e-05 ***
> factor(msa_code)33460 5.229e-02 7.891e-02 0.663 0.507609
> factor(msa_code)35620 -3.197e-01 7.565e-02 -4.225 2.45e-05 ***
> factor(msa_code)36740 1.269e-01 6.948e-02 1.826 0.067868 .
> factor(msa_code)37980 1.394e-01 4.388e-02 3.178 0.001497 **
> factor(msa_code)38060 -6.935e-02 6.124e-02 -1.132 0.257540
> factor(msa_code)38300 1.647e-01 3.986e-02 4.133 3.67e-05 ***
> factor(msa_code)38900 2.605e-01 1.420e-01 1.835 0.066664 .
> factor(msa_code)39300 -9.612e-02 4.704e-02 -2.043 0.041103 *
> factor(msa_code)40140 -2.353e-01 3.562e-02 -6.605 4.59e-11 ***
> factor(msa_code)40900 NA NA NA NA
> factor(msa_code)41740 NA NA NA NA
> factor(msa_code)41860 NA NA NA NA
> factor(msa_code)42660 NA NA NA NA
> factor(msa_code)45300 NA NA NA NA
> factor(msa_code)47900 NA NA NA NA
>
> I wonder why I kep getting those "NAs". Thank you!
>
> Gary
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374
More information about the R-help
mailing list