[R] Dummy variables or factors?
andrew
andrewjohnroyal at gmail.com
Wed Oct 21 05:51:27 CEST 2009
Oh dear, that doesn't look right at all. I shall have a think about
what I did wrong and maybe follow my own advice and consult the doco
myself!
On Oct 21, 2:45 pm, andrew <andrewjohnro... at gmail.com> wrote:
> The following is *significantly* easier to do than try and add in
> dummy variables, although the dummy variable approach is going to give
> you exactly the same answer as the factor method, but possibly with a
> different baseline.
>
> Basically, you might want to search the lm help and possibly consult a
> stats book on information about how the design matrix is constructed
> in both cases.
>
> > xF <- factor(1:10)
> > N <- 1000
> > xFs <- sample(x=xF,N,replace = T)
> > yFs <- rnorm(N, mean = as.numeric(xFs))
> > lm(yFs ~ xFs)
>
> Call:
> lm(formula = yFs ~ xFs)
>
> Coefficients:
> (Intercept) xFs2 xFs3 xFs4
> xFs5 xFs6 xFs7 xFs8
> 0.7845 1.1620 2.1474 3.1391 4.2183
> 5.2621 6.0814 7.4170
> xFs9 xFs10
> 8.2193 9.2987
>
> > lm(yFs ~ diag(10)[,1:9][xFs,])
>
> Call:
> lm(formula = yFs ~ diag(10)[, 1:9][xFs, ])
>
> Coefficients:
> (Intercept) diag(10)[, 1:9][xFs, ]1 diag(10)[, 1:9]
> [xFs, ]2 diag(10)[, 1:9][xFs, ]3
> 10.083 -9.299
> -8.137 -7.151
> diag(10)[, 1:9][xFs, ]4 diag(10)[, 1:9][xFs, ]5 diag(10)[, 1:9]
> [xFs, ]6 diag(10)[, 1:9][xFs, ]7
> -6.160 -5.080
> -4.037 -3.217
> diag(10)[, 1:9][xFs, ]8 diag(10)[, 1:9][xFs, ]9
> -1.882 -1.079
>
> On Oct 21, 9:44 am, David Winsemius <dwinsem... at comcast.net> wrote:
>
>
>
> > On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:
>
> > > Dear R-people,
>
> > > I am analyzing epidemiological data using GLMM using the lmer
> > > package. I usually explore the assumption of linearity of continuous
> > > variables in the logit of the outcome by creating 4 categories of
> > > the variable, performing a bivariate logistic regression, and then
> > > plotting the coefficients of each category against their mid points.
> > > That gives me a pretty good idea about the linearity assumption and
> > > possible departures from it.
>
> > > I know of people who create 0,1 dummy variables in order to relax
> > > the linearity assumption. However, I've read that dummy variables
> > > are never needed (nor are desireble) in R! Instead, one should make
> > > use of factors variable. That is much easier to work with than dummy
> > > variables and the model itself will create the necessary dummy
> > > variables.
>
> > > Having said that, if my data violates the linearity assumption, does
> > > the use of a factors for the variable in question helps overcome the
> > > lack of linearity?
>
> > No. If done by dividing into samall numbers of categories after
> > looking at the data, it merely creates other (and probably more
> > severe) problems. If you are in the unusal (although desirable)
> > position of having a large number of events across the range of the
> > covariates in your data, you may be able to cut your variable into
> > quintiles or deciles and analyze the resulting factor, but the
> > preferred approach would be to fit a regression spline of sufficient
> > complexity.
>
> > > Thanks in advance.
>
> > --
>
> > David Winsemius, MD
> > Heritage Laboratories
> > West Hartford, CT
>
> > ______________________________________________
> > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list