[R] Odp: Creating Dummy Variables in R

Rolf Turner r.turner at auckland.ac.nz
Thu Dec 17 00:10:57 CET 2009


On 17/12/2009, at 11:14 AM, whitaker m. (mw1006) wrote:

> I have a much larger dataset than in my original email (attached -  
> price dependent upon weight, Clarity (different levels IF-SI2),  
> colour(levels D-L) and Cut (ideal-fair), and tried the regression  
> command:
>
>> diamond.lm<-lm(price~weight+IF+VVS1+VVS2+VS1+VS2+SI1+SI2+I1+I2+D+E 
>> +F+G+H+I+J+K+L+ideal+excellent+very.good+good+fair,  
>> data="Diamonds2.txt")
>
> Error in eval(predvars, data, env) : invalid 'envir' argument
>
> Which lead to the error message below the command. I have tried  
> searching for this, and assumed this was down to having categrocial  
> variables within the data, is this a correct assumption or am i  
> doing something else wrong? Apologies if this is a bit of a basic  
> question!

(a) You don't want the quote marks around the data argument.  That is  
the source
of the "invalid 'envir' argument" error.

(b) You are not using the power of R.  ***Don't*** create your own  
dummy variables;
let lm() do it for you.  Learn something about how R works, for  
crying out loud.

Essentially you should be doing something like

	diamond.lm <- lm(price ~ weight + Clarity + colour + Cut, data =  
Diamond.txt)

where price, weight, Clarity, colour, and Cut are columns of the data  
frame
Diamond.txt.  The columns price and weight should be numeric vectors;  
Clarity,
colour, and Cut should be ***factors***.

It is slightly worrying that you refer to ``Diamond.txt''.  That  
``.txt'' suffix
would lead me to believe that ``Diamond.txt'' is a (text) file  
containing your
data.  If that is the case, this won't work.  The ``data'' argument  
to lm() must
be an ***R object***.  You have to read the data file into an R  
object before trying
to use the data in a call to lm().  Something like

	Diamond <- read.table("Diamond.txt") # Note that you ***do*** want  
to quote the file name.

Then

	diamond.lm <- lm(price ~ weight + Clarity + colour + Cut, data =  
Diamond)

should do what you want.  The dummy variable encoding used will be  
determined
by the (first) value of options()$contrasts, which by default i  
contr.treatment.

Read up on factors and contrasts.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}




More information about the R-help mailing list