[R] Efficient passing through big data.frame and modifying select fields
Johannes Graumann
johannes_graumann at web.de
Tue Nov 25 15:16:01 CET 2008
Hi all,
I have relatively big data frames (> 10000 rows by 80 columns) that need to be exposed to "merge". Works marvelously well in general, but some fields of the data frames actually contain multiple ";"-separated values encoded as a character string without defined order, which makes the fields not match each other.
Example:
> frame1[1,1]
[1] "some;thing"
>frame2[2,1]
[2] "thing;some"
In order to enable merging/duplicate identification of columns containing these strings, I wrote the following function, which passes through the rows one by one, identifies ";"-containing cells, splits and resorts them.
ResortCombinedFields <- function(dframe){
if(!is.data.frame(dframe)){
stop("\"ResortCombinedFields\" input needs to be a data frame.")
}
for(row in seq(nrow(dframe))){
for(mef in grep(";",dframe[row,])){
dframe[row,mef] <- paste(sort(unlist(strsplit(dframe[row,mef],";"))),collapse=";")
}
}
return(dframe)
}
works fine, but is horribly inefficient. How might this be tackled more elegantly?
Thanks for any input, Joh
More information about the R-help
mailing list