[R] lm function in R
Bert Gunter
gunter.berton at gene.com
Sat Feb 13 22:30:19 CET 2010
?formula
Bert Gunter
Genentech Nonclinical Statistics
Thanks Dan. Yes that was very helpful. I didn't see the change from '*' to
'+'.
Seems like when I put * it means - interaction & when I put + it's not an
interaction.
Is it correct to assume then that...
When I put + R evaluates the following equation:
Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
But when I put * R evaluates the following equation;
Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........ + c
Is this correct? If it is then can someone point me to any sources that
will explain how the coefficients (such as b0... bk, b12.. , b123..) are
calculated. I guess, one source is the R source code :) but is there any
other documentation anywhere?
Please let me know. Thanks.
> >
> > Thanks for the replies everyone. Greatly appreciate it. Some progress,
> > but
> > now I am getting the following values when I don't use "as.factor"
> >
> > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
> >
> > Is that what you guys get?
>
>
> If you look at Phil's response below, no, that is not what he got. The
> difference is that you are specifying an interaction, whereas Phil did not
> (because the equation you initially specified did not include an
> interaction. Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
>
> >
> >
> > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> > <spector at stat.berkeley.edu>wrote:
> >
> > > By converting the two variables to factors, you are fitting
> > > an entirely different model. Leave out the as.factor stuff
> > > and it will work exactly as you want it to.
> > >
> > > dat
> > >>
> > > V1 V2 V3 V4
> > > 1 s1 14 4 1
> > > 2 s2 23 4 2
> > > 3 s3 30 7 2
> > > 4 s4 50 7 4
> > > 5 s5 39 10 3
> > > 6 s6 67 10 6
> > >
> > >> names(dat) = c('id','y','x1','x2')
> > >> z = lm(y~x1+x2,dat)
> > >> predict(z)
> > >>
> > > 1 2 3 4 5 6 15.16667 24.66667
> > > 27.66667 46.66667 40.16667 68.66667
> > >
> > >
> > >
> > >
> > >
> > > On Fri, 12 Feb 2010, Something Something wrote:
> > >
> > > Hello,
> > >>
> > >> I am trying to learn how to perform Multiple Regression Analysis in
R.
> > I
> > >> decided to take a simple example given in this PDF:
> > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> > >>
> > >> I created a small CSV called, students.csv that contains the
following
> > >> data:
> > >>
> > >> s1 14 4 1
> > >> s2 23 4 2
> > >> s3 30 7 2
> > >> s4 50 7 4
> > >> s5 39 10 3
> > >> s6 67 10 6
> > >>
> > >> Col headers: Student id, Memory span(Y), age(X1), speech rate(X2)
> > >>
> > >> Now the expected results are:
> > >>
> > >> yHat[0]:15.166666666666668
> > >> yHat[1]:24.666666666666668
> > >> yHat[2]:27.666666666666664
> > >> yHat[3]:46.666666666666664
> > >> yHat[4]:40.166666666666664
> > >> yHat[5]:68.66666666666667
> > >>
> > >> This is based on the following equation (given in the PDF): Y = 1.67
> +
> > X1
> > >> +
> > >> 9.50 X2
> > >>
> > >> I ran the following commands in R:
> > >>
> > >> data = read.table("students.csv", head=F, as.is=T, na.string=".",
> > >> row.nam=NULL)
> > >> X1 = as.factor(data[[3]])
> > >> X2 = as.factor(data[[4]])
> > >> Y = data[[2]]
> > >> mod = lm(Y ~ X1*X2, na.action = na.exclude)
> > >> Y.hat = fitted(mod)
> > >> Y.hat
> > >>
> > >> This gives me the following output:
> > >>
> > >> Y.hat
> > >>>
> > >> 1 2 3 4 5 6
> > >> 14 23 30 50 39 67
> > >>
> > >> Obviously I am doing something wrong. Please help. Thanks.
> > >>
>
> Hope this is helpful,
>
> Dan
>
>
>
