[R] Odp: Creating Dummy Variables in R
Rolf Turner
r.turner at auckland.ac.nz
Thu Dec 17 00:10:57 CET 2009
On 17/12/2009, at 11:14 AM, whitaker m. (mw1006) wrote:
> I have a much larger dataset than in my original email (attached -
> price dependent upon weight, Clarity (different levels IF-SI2),
> colour(levels D-L) and Cut (ideal-fair), and tried the regression
> command:
>
>> diamond.lm<-lm(price~weight+IF+VVS1+VVS2+VS1+VS2+SI1+SI2+I1+I2+D+E
>> +F+G+H+I+J+K+L+ideal+excellent+very.good+good+fair,
>> data="Diamonds2.txt")
>
> Error in eval(predvars, data, env) : invalid 'envir' argument
>
> Which lead to the error message below the command. I have tried
> searching for this, and assumed this was down to having categrocial
> variables within the data, is this a correct assumption or am i
> doing something else wrong? Apologies if this is a bit of a basic
> question!
(a) You don't want the quote marks around the data argument. That is
the source
of the "invalid 'envir' argument" error.
(b) You are not using the power of R. ***Don't*** create your own
dummy variables;
let lm() do it for you. Learn something about how R works, for
crying out loud.
Essentially you should be doing something like
diamond.lm <- lm(price ~ weight + Clarity + colour + Cut, data =
Diamond.txt)
where price, weight, Clarity, colour, and Cut are columns of the data
frame
Diamond.txt. The columns price and weight should be numeric vectors;
Clarity,
colour, and Cut should be ***factors***.
It is slightly worrying that you refer to ``Diamond.txt''. That
``.txt'' suffix
would lead me to believe that ``Diamond.txt'' is a (text) file
containing your
data. If that is the case, this won't work. The ``data'' argument
to lm() must
be an ***R object***. You have to read the data file into an R
object before trying
to use the data in a call to lm(). Something like
Diamond <- read.table("Diamond.txt") # Note that you ***do*** want
to quote the file name.
Then
diamond.lm <- lm(price ~ weight + Clarity + colour + Cut, data =
Diamond)
should do what you want. The dummy variable encoding used will be
determined
by the (first) value of options()$contrasts, which by default i
contr.treatment.
Read up on factors and contrasts.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list