# [R] lm function in R

Daniel Nordlund djnordlund at verizon.net
Sat Feb 13 02:54:00 CET 2010

```> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Something Something
> Sent: Friday, February 12, 2010 5:28 PM
> To: Phil Spector; r-help at r-project.org
> Subject: Re: [R] lm function in R
>
> Thanks for the replies everyone.  Greatly appreciate it.  Some progress,
> but
> now I am getting the following values when I don't use "as.factor"
>
> 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
>
> Is that what you guys get?

If you look at Phil's response below, no, that is not what he got.  The difference is that you are specifying an interaction, whereas Phil did not (because the equation you initially specified did not include an interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.

>
>
> On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> <spector at stat.berkeley.edu>wrote:
>
> > By converting the two variables to factors, you are fitting
> > an entirely different model.  Leave out the as.factor stuff
> > and it will work exactly as you want it to.
> >
> >  dat
> >>
> >  V1 V2 V3 V4
> > 1 s1 14  4  1
> > 2 s2 23  4  2
> > 3 s3 30  7  2
> > 4 s4 50  7  4
> > 5 s5 39 10  3
> > 6 s6 67 10  6
> >
> >> names(dat) = c('id','y','x1','x2')
> >> z = lm(y~x1+x2,dat)
> >> predict(z)
> >>
> >       1        2        3        4        5        6 15.16667 24.66667
> > 27.66667 46.66667 40.16667 68.66667
> >
> >
> >                                        - Phil Spector
> >                                         Statistical Computing Facility
> >                                         Department of Statistics
> >                                         UC Berkeley
> >                                         spector at stat.berkeley.edu
> >
> >
> >
> > On Fri, 12 Feb 2010, Something Something wrote:
> >
> >  Hello,
> >>
> >> I am trying to learn how to perform Multiple Regression Analysis in R.
> I
> >> decided to take a simple example given in this PDF:
> >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> >>
> >> I created a small CSV called, students.csv that contains the following
> >> data:
> >>
> >> s1 14 4 1
> >> s2 23 4 2
> >> s3 30 7 2
> >> s4 50 7 4
> >> s5 39 10 3
> >> s6 67 10 6
> >>
> >> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
> >>
> >> Now the expected results are:
> >>
> >> yHat[0]:15.166666666666668
> >> yHat[1]:24.666666666666668
> >> yHat[2]:27.666666666666664
> >> yHat[3]:46.666666666666664
> >> yHat[4]:40.166666666666664
> >> yHat[5]:68.66666666666667
> >>
> >> This is based on the following equation (given in the PDF):  Y = 1.67 +
> X1
> >> +
> >> 9.50 X2
> >>
> >> I ran the following commands in R:
> >>
> >> data = read.table("students.csv", head=F, as.is=T, na.string=".",
> >> row.nam=NULL)
> >> X1 = as.factor(data[[3]])
> >> X2 = as.factor(data[[4]])
> >> Y = data[[2]]
> >> mod = lm(Y ~ X1*X2, na.action = na.exclude)
> >> Y.hat = fitted(mod)
> >> Y.hat
> >>
> >> This gives me the following output:
> >>
> >>  Y.hat
> >>>
> >> 1  2  3  4  5  6
> >> 14 23 30 50 39 67
> >>
> >> Obviously I am doing something wrong.  Please help.  Thanks.
> >>

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

```