[R] recode categorial vars into binary data

David Winsemius dwinsemius at comcast.net
Tue May 7 19:00:03 CEST 2013


On May 7, 2013, at 9:20 AM, D. Alain wrote:

> Dear R-List, 
> 
> I would like to recode categorial variables into binary data, so that all values above median are coded 1 and all values below 0, separating each var into two equally large groups (e.g. good performers = 0 vs. bad performers =1).
> 
> I have not succeeded so far in finding a nice solution to do that in R. I thought there might be a better way than ordering each column and recoding the first 50% into 0 and the second into 1. If I use ifelse I have a problem with cases that share the same rank being all median. 
> 
> e.g.
> df<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(1,1,4,2,3,2,2,5,2,2),k2=c(1,2,3,2,1,2,1,3,3,2),result=c(4,3,5,4,2,6,4,4,2,3)))

First off, stop using cbind() when it is not needed. You will not see the reason when the columns are all numeric but you will start experiencing pain and puzzlement when the arguments are of mixed classes. The data.frame function will do what you want. (Where do people pick up this practice anyway?)



df[,2] <- as.numeric( order(df[,2]) >= length(df[,2])/2 )




> 
> now I want to recode k1 and k2 so that I have half of the values recoded 0 and half recoded 1, split around the median point. The median of k1 is 2 which would lead to unequal groupsize if used 2 as cutoff, so all values k1=2 should be recoded 1 or 0 randomly until both categories have the same length.
> 
> something like
> 
> df.rec<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(0,0,1,0,1,1,0,1,0,1),k2=c(0,1,1,0,0,1,0,1,1,0),result=c(4,3,5,4,2,6,4,4,2,3)))
> 
> Can anyone help?
> 
> Thank you in advance.
> 
> Best wishes.
> Alain  
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list