[R] Dummy variables or factors?
andrew
andrewjohnroyal at gmail.com
Wed Oct 21 05:58:05 CEST 2009
Sorry for this third posting - the second method is the same as the
first after all: the coefficients of the first linear model *is* a
linear transformation of the second. Just got confused with the
pasting, tis all.
On Oct 21, 2:51 pm, andrew <andrewjohnro... at gmail.com> wrote:
> Oh dear, that doesn't look right at all. I shall have a think about
> what I did wrong and maybe follow my own advice and consult the doco
> myself!
>
> On Oct 21, 2:45 pm, andrew <andrewjohnro... at gmail.com> wrote:
>
>
>
> > The following is *significantly* easier to do than try and add in
> > dummy variables, although the dummy variable approach is going to give
> > you exactly the same answer as the factor method, but possibly with a
> > different baseline.
>
> > Basically, you might want to search the lm help and possibly consult a
> > stats book on information about how the design matrix is constructed
> > in both cases.
>
> > > xF <- factor(1:10)
> > > N <- 1000
> > > xFs <- sample(x=xF,N,replace = T)
> > > yFs <- rnorm(N, mean = as.numeric(xFs))
> > > lm(yFs ~ xFs)
>
> > Call:
> > lm(formula = yFs ~ xFs)
>
> > Coefficients:
> > (Intercept) xFs2 xFs3 xFs4
> > xFs5 xFs6 xFs7 xFs8
> > 0.7845 1.1620 2.1474 3.1391 4.2183
> > 5.2621 6.0814 7.4170
> > xFs9 xFs10
> > 8.2193 9.2987
>
> > > lm(yFs ~ diag(10)[,1:9][xFs,])
>
> > Call:
> > lm(formula = yFs ~ diag(10)[, 1:9][xFs, ])
>
> > Coefficients:
> > (Intercept) diag(10)[, 1:9][xFs, ]1 diag(10)[, 1:9]
> > [xFs, ]2 diag(10)[, 1:9][xFs, ]3
> > 10.083 -9.299
> > -8.137 -7.151
> > diag(10)[, 1:9][xFs, ]4 diag(10)[, 1:9][xFs, ]5 diag(10)[, 1:9]
> > [xFs, ]6 diag(10)[, 1:9][xFs, ]7
> > -6.160 -5.080
> > -4.037 -3.217
> > diag(10)[, 1:9][xFs, ]8 diag(10)[, 1:9][xFs, ]9
> > -1.882 -1.079
>
> > On Oct 21, 9:44 am, David Winsemius <dwinsem... at comcast.net> wrote:
>
> > > On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:
>
> > > > Dear R-people,
>
> > > > I am analyzing epidemiological data using GLMM using the lmer
> > > > package. I usually explore the assumption of linearity of continuous
> > > > variables in the logit of the outcome by creating 4 categories of
> > > > the variable, performing a bivariate logistic regression, and then
> > > > plotting the coefficients of each category against their mid points.
> > > > That gives me a pretty good idea about the linearity assumption and
> > > > possible departures from it.
>
> > > > I know of people who create 0,1 dummy variables in order to relax
> > > > the linearity assumption. However, I've read that dummy variables
> > > > are never needed (nor are desireble) in R! Instead, one should make
> > > > use of factors variable. That is much easier to work with than dummy
> > > > variables and the model itself will create the necessary dummy
> > > > variables.
>
> > > > Having said that, if my data violates the linearity assumption, does
> > > > the use of a factors for the variable in question helps overcome the
> > > > lack of linearity?
>
> > > No. If done by dividing into samall numbers of categories after
> > > looking at the data, it merely creates other (and probably more
> > > severe) problems. If you are in the unusal (although desirable)
> > > position of having a large number of events across the range of the
> > > covariates in your data, you may be able to cut your variable into
> > > quintiles or deciles and analyze the resulting factor, but the
> > > preferred approach would be to fit a regression spline of sufficient
> > > complexity.
>
> > > > Thanks in advance.
>
> > > --
>
> > > David Winsemius, MD
> > > Heritage Laboratories
> > > West Hartford, CT
>
> > > ______________________________________________
> > > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> > ______________________________________________
> > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list