[R] Dealing with data

David Winsemius dwinsemius at comcast.net
Fri Aug 13 19:45:00 CEST 2010


On Aug 13, 2010, at 1:22 PM, TGS wrote:

> To clarify, I'd like to create a column of indicators for the  
> respective letters so that I could maybe do regression on  
> indicators, etc.

You can just enter that column name in a regression formula. No need  
to create a separate variable. Try:

lm(count ~ spray, data=InsectSprays)

>
> For instance, "A" gets "1", "B" gets "2", and so on.

That happens to be exactly the manner in which factor variables are  
stored internally. Try this:

str(InsectSprays)

If for some better reason, other than what you have so far stated, you  
still needed to get the at the internal values of the factor  
variables, you can just use:

as.numeric(InsectSprays$spray)


This question is making me think you have not yet worked through much  
of "Introduction to R".

http://cran.r-project.org/doc/manuals/R-intro.pdf

Admittedly it is long but I think you said you were strong on CS and  
weaker in statistics? If you are in a real hurry and had a solid stats  
background,  you could look at other contributed introductions. One  
that kept me up at night when I was starting R (about 5 years ago) was  
Faraway's "Practical Regression and ANOVA Using R:

http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf

I also though that Kuhnert and Venables' offering was scintillating:
http://cran.r-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip

Others:
http://cran.r-project.org/other-docs.html

Faraway gets to factor object types by page 11, whereas you would need  
to be several chapters into the "Introduction to R" to get that  
information.

-- 
David.

>
> On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
>
>
> On Aug 13, 2010, at 1:03 PM, TGS wrote:
>
>> # how would I code in R to look at the letter of the alphabet
>> # in the second column and create a indicator column for the
>> # corresponding letter?
>>
>> data(InsectSprays)
>> InsectSprays$spray
>
> It's already what most people mean when they say "indicator column",  
> i.e., a factor variable (and not a character vector) .... so,  what  
> do _you_ mean?
--

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list