[R] Discretize factors?
Noah Silverman
noah at smartmediacorp.com
Sun May 16 20:24:55 CEST 2010
I could, but with close to 100 columns, its messy.
On 5/16/10 11:22 AM, Peter Ehlers wrote:
> On 2010-05-16 11:06, Noah Silverman wrote:
>> Update,
>>
>> I have it working, but now its producing really ugly labels. Must be a
>> small adjustment to the code. Any ideas??
>>
>> ##Create example data.frame
>> group<- c("A", "B","B","C","C","C")
>> a<- c(1,4,3,4,5,6)
>> b<- c(5,4,5,3,4,5)
>> d<- data.frame(cbind(a,b,group))
>>
>> #create new frame with discretized group
>>> cbind(d[,1:2], model.matrix(~0+d[,3]) )
>> a b d[, 3]A d[, 3]B d[, 3]C
>> 1 1 5 1 0 0
>> 2 4 4 0 1 0
>> 3 3 5 0 1 0
>> 4 4 3 0 0 1
>> 5 5 4 0 0 1
>> 6 6 5 0 0 1
>>
>>
>> So, as you can see, it works, but the labels for the groups don't
>>
>> I then tried using the column name instead of number and still got ugly
>> results:
>>
>>> cbind(d[,1:2], model.matrix(~0+d[,"group"]) )
>> a b d[, "group"]A d[, "group"]B d[, "group"]C
>> 1 1 5 1 0 0
>> 2 4 4 0 1 0
>> 3 3 5 0 1 0
>> 4 4 3 0 0 1
>> 5 5 4 0 0 1
>> 6 6 5 0 0 1
>>
>>
>>
>> Any ideas?
>>
>
> Can't you just use names(...) <- c() on your final dataframe?
>
> -Peter Ehlers
>
>> -N
>>
>>
>>
>> On 5/15/10 11:02 AM, Noah Silverman wrote:
>>> Hi,
>>>
>>> I'm looking for an easy way to discretize factors in R
>>>
>>> I've noticed that the lm function does this automatically with a nice
>>> result.
>>>
>>> If I have
>>>
>>> group<- c("A", "B","B","C","C","C")
>>>
>>> and run:
>>>
>>> lm(result ~ x1 + group)
>>>
>>> The lm function has split the group into separate binary variables
>>> {0,1}
>>> before performing the regression. I now have:
>>> groupA
>>> groupB
>>> groupC
>>>
>>> Some of the other models that I want to try won't accept factors, so
>>> they need to be discretized this way.
>>>
>>> Is there a command in R for this, or some easy shortcut? (I tried
>>> digging into the lm code, but couldn't find where this is being done.)
>>>
>>> Thanks!
>>>
>>> -N
>>>
More information about the R-help
mailing list