[R] suggestions/improvements for recoding strategy

Peter Ehlers ehlers at ucalgary.ca
Mon May 17 22:32:11 CEST 2010


On 2010-05-17 12:54, Henrique Dallazuanna wrote:
> Try this:
>
> newData<- sapply(numdat, function(x)lapply(strsplit(as.character(x), '-'),
> function(.x)mean(as.numeric(.x))))

There's a potential problem if numdat contains negative numbers.
It would be better to restrict the recoding to character or
factor columns.

cl <- sapply(numdat, class)
idx <- which(cl %in% c('character','factor'))
g <- function(x){
    sapply(strsplit(as.character(x),"-"),
    function(.x) mean(as.numeric(.x), na.rm=TRUE))
}

newData <- numdat
for(i in idx) newData[,i] <- g(newData[,i])
newData

  -Peter Ehlers

>
> On Mon, May 17, 2010 at 3:29 PM, Juliet Hannah<juliet.hannah at gmail.com>wrote:
>
>> I am recoding some data. Many values that should be 1.5 are recorded
>> as 1-2. Some example data and my solution is below. I am curious about
>> better approaches or any other suggestions. Thanks!
>>
>> # example input data
>>
>> myData<- read.table(textConnection("id, v1, v2, v3
>> a,1,2,3
>> b,1-2,,3-4
>> c,,3,4"),header=TRUE,sep=",")
>> closeAllConnections()
>>
>> # the first column is IDs so remove that
>>
>> numdat<- myData[,-1]
>>
>> # function to change dashes: 1-2 to 1.5
>>
>> myrecode<- function(mycol)
>> {
>>    newcol<- mycol
>>    newcol<- gsub("1-2","1.5",newcol)
>>    newcol<- gsub("2-3","2.5",newcol)
>>    newcol<- gsub("3-4","3.5",newcol)
>>    newcol<- as.numeric(newcol)
>>
>> }
>>
>> newData<- data.frame(do.call(cbind,lapply(numdat,myrecode)))
>>



More information about the R-help mailing list