[R] Factors? I think?

peter dalgaard pdalgd at gmail.com
Fri Sep 9 11:57:19 CEST 2011


On Sep 9, 2011, at 09:13 , Petr PIKAL wrote:

> Hi
> 
> Isn't it something for merge is designed?

Sort of. (You'd need to think carefully about what happens with non-matched codes.)

Wouldn't this do the trick as well?

in <- as.character(DeptCodes$DeptCodes)
out <- as.character(DeptCodes$DeptNames)
Doctors <- within(Doctors, DeptNames <- factor(DocDepts, levels=in, labels=out))

> 
>> merge(Doctors, DeptCodes, by.x="DocDepts", by.y="Depts")
>  DocDepts                    Docs  DeptNames
> 1     1111 Christian\nChristianson      Heart
> 2     5555               Bob Smith      Brain
> 3     9999              Greg Jones Anesthesia
> 4     9999             Al Franklin Anesthesia
> 
> It is easy to get rid of the first column.
> 
> Regards
> Petr
> 
> 
>> Re: [R] Factors? I think?
>> 
>> It's probably easiest to think of this as a compound map (doctor -> dept
>> code -> factor -> character -> integer -> dept code -> dept name as
>> character) and to treat the code as such: if you already have R objects 
> with
>> the codes in them, it shouldn't be hard to do the transformation.
>> 
>> Consider the following toy set up
>> 
>> Docs = factor(c("Greg Jones","Bob Smith","Al Franklin","Christian
>> Christianson"))
>> DocDepts = factor(c("9999","5555","9999","1111"))
>> Doctors = data.frame(Docs, DocDepts)
>> 
>> Depts = factor(1:9 * 1111)
>> DeptNames =
>> factor(c
>> 
> ("Heart","Kidney","Feet","Teeth","Brain","Digestive","Diagnostic","Surgery","Anesthesia"))
>> DeptCodes = data.frame(Depts,DeptNames)
>> # Everything in our data frames is now some sort of factor so we can't 
> match
>> things up in the "normal" ways
>> 
>> # Now, you have to do some unpleasantly long but pretty straightforward 
> code
>> to convert the factors in a way that makes the match properly:
>> 
>> Doctors$numbers <- as.numeric(as.character(Doctors[,2])) ## Will extract 
> the
>> "9999" as a real 9999, rather than the internal factor code
>> DeptCodes$values <- as.numeric(as.character(DeptCodes[,1]))
>> 
>> match(Doctors$numbers, DeptCodes$values) ## Will map the department 
> numbers
>> onto the correct rows of the DeptCodes df
>> 
>> # Now we get the correct names using those row numbers
>> DeptAssignments = as.character(DeptCodes[match(Doctors$numbers,
>> DeptCodes$values),2])
>> 
>> # Combine with doctor names to finish
>> NamesandTitles = cbind(as.character(Doctors[,1]),DeptAssignments)
>> 
>> It's not the most elegant way of doing it, but hopefully it gives some
>> insight into how to work with factors. If you can send a little more
>> information about how your data is currently stored we can optimize this
>> into something easily repeatable but without specifics, I have to work 
> in
>> generalities.
>> 
>> Hope this helps,
>> 
>> Michael Weylandt
>> 
>> On Thu, Sep 8, 2011 at 6:36 PM, Totally Inept <kramer877 at gmail.com> 
> wrote:
>> 
>>> First of all, let me apologize, as this is probably an absurdly basic
>>> question. I did search before asking, but perhaps my ineptitude didn't
>>> allow
>>> me to apply what I read to what I'm doing. Totally new to R, and 
> haven't
>>> done any code in any language in a long time.
>>> 
>>> Basically I've got categories. They're department codes for doctors 
> (say,
>>> 9999 for radiology or 5555 for endocrinology), which of course means 
> that
>>> there are a good number of them, i.e. it's not practical for me to 
> write
>>> them all out as I usually see in examples of categorical variables
>>> (factors).
>>> 
>>> And then I've got a list of doctors that I'm actually interested in. I 
> have
>>> the department codes associated with each, but I need to map the 
> department
>>> name to the doctor name. So I might have Greg Jones, Bob Smith, Tom 
> Wilson,
>>> etc... to go with 1234, 9999, 2222, etc.
>>> 
>>> I need to turn Greg Jones, Bob Smith, ... and 1234, 9999, ... into 
> Greg
>>> Jones, Bob Smith, ... Cardiology, Radiology, ....
>>> 
>>> Obviously I could just search and replace within the csv files but I 
> need
>>> something durable that I can run things through repeatedly.
>>> 
>>> Anyhow, thanks to anyone willing to humor me with an answer.
>>> 
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Factors-I-think-tp3800413p3800413.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>>   [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list