[R] Conditional data frame manipulation

Marc Schwartz marc_schwartz at comcast.net
Sat Feb 17 19:52:27 CET 2007


On Sat, 2007-02-17 at 17:34 +0100, Johannes Graumann wrote:
> Hi all,
> 
> My current project brought forth the snippet below, which modifies in each
> row of a data frame a certain field depending on another field in the same
> row. Dealing with data of some 30000 entries this works, but is horribly
> slow. Can anyone show this newbie how to do this properly (faster ;0)?
> 
> for (i in 1:nrow(dataframe)){
>   if (any(grep('^yes$',dataframe[i,][['Field1']]))){
>     dataframe[i,]['Field1'] <- dataframe[i,]['Field2']
>   } else {
>     dataframe[i,]['Field1'] <- NA
>   }
> }
> 
> Thanks for your insights, Joh

Beyond the for() loop issue. you are doing a lot of unnecessary
subsetting.

For example:

  dataframe[i,][['Field1']]

can be replaced with:

  dataframe[['Field1']]

or if you have to loop:

  dataframe[i, 'Field1']


See ?Extract

One clarification question on your use of grep(), which is do you have
entries that have a 'yes' at the end of the field, or are you just
looking for a field entry != 'yes'?  If the latter, you don't need to
use grep() of course.

One potential approach is the following:

  dataframe[["Field1"]] <- with(dataframe, 
                                ifelse(any(grep("^yes$", Field1)), 
                                       Field2, NA))

If you are just looking for an entry != "yes", then:

  dataframe[["Field1"]] <- with(dataframe, 
                                ifelse(Field1 != "yes", 
                                       Field2, NA))

See ?ifelse and ?with.  Also look at ?replace for an alternative way to
replace() values based upon conditions.

HTH,

Marc Schwartz



More information about the R-help mailing list