[R] Replace any values in a data frame based on another data frame

Bert Gunter bgunter.4567 at gmail.com
Sun Jul 31 08:15:37 CEST 2016


Marine:

Thanks for the reproducible example. I would not have fooled with this
otherwise!

1. First note that your specification that you wish to replace only
those values in col3 and col4 of df1 that don't match with those of
df2 is irrelevant: if you replace those that do match you don't change
them. This means that you want to replace the rows of df1 for which
the first 2 columns of df1 match those of df2 with the corresponding
rows of df2.

2. You need to understand and deal with factors to do this. By
default, non-numeric columns will be read into a data frame as
factors, which is what happens in your example. You need to first
convert all such columns to character to make things work properly (or
at least, I had to doing it as shown below).

Also note that there was a typo in your code: "df2< -"  should be
"df2<-" (you inserted an extra space).

Anyway, here is one way to go about it, but I suggest that you wait
for other solutions before accepting it, as I would not be surprised
if there were better ways. However, this should be reasonably fast, as
everything is vectorized.

> df1 <- data.frame(col1 = c(1:6), col2 = c(rep("a", 3), rep("b", 3)), col3 = c(rep("AA", 2), rep("BB", 2), rep("CC", 2)), col4=c(1,8,6,9,7,6))
>
> df2<- data.frame(col1 = c(1,3,5,6), col2 = c(rep("a", 2), rep("b", 2)), col3 = c(rep("AA", 1), rep("EE", 1), rep("FF", 1), rep("CC", 1)), col4=c(1,8,5,9))
>
> ## convert factors to character vectors
> df1[,3] <- as.character(df1[,3])
> df2[,3] <- as.character(df2[,3])
>
> ## concatenate first 2 columns of both df's to identify rows for matching
> first <- with(df1,paste0(col1,col2))
> second <- with(df2,paste0(col1,col2))
>
> ## now find the rows of the df1 that match those of df2
> wh <-match(first,second)
>
> ## Use indexing to substitute
> df1[!is.na(wh),] <- df2[na.omit(wh),]
>
> df1
  col1 col2 col3 col4
1    1    a   AA    1
2    2    a   AA    8
3    3    a   EE    8
4    4    b   BB    9
5    5    b   FF    5
6    6    b   CC    9

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Jul 30, 2016 at 8:45 PM, Marine Regis <marine.regis at hotmail.fr> wrote:
> Hello,
>
> I have two data frames with different sizes but with the same number of columns.
>
>> df1 <- data.frame(col1 = c(1:6), col2 = c(rep("a", 3), rep("b", 3)), col3 = c(rep("AA", 2), rep("BB", 2), rep("CC", 2)), col4=c(1,8,6,9,7,6))
>
>> df1
>
>   col1 col2 col3 col4
>
> 1    1    a   AA    1
>
> 2    2    a   AA    8
>
> 3    3    a   BB    6
>
> 4    4    b   BB    9
>
> 5    5    b   CC    7
>
> 6    6    b   CC    6
>
>
>
>> df2< - data.frame(col1 = c(1,3,5,6), col2 = c(rep("a", 2), rep("b", 2)), col3 = c(rep("AA", 1), rep("EE", 1), rep("FF", 1), rep("CC", 1)), col4=c(1,8,5,9))
>
>
>
>> df2
>
>   col1 col2 col3 col4
>
> 1    1    a   AA    1
>
> 2    3    a   EE    8
>
> 3    5    b   FF    5
>
> 4    6    b   CC    9
>
>
>
>
>
> Based on col1 and col2, how can I replace any values in col3 and col4 of df1 which don't match with those of df2 by the values of df2
>
>
>
> In this example, the result will be:
>
>
>
>> df1
>
>    col1 col2 col3 col4
>
> 1    1    a   AA    1
>
> 2    2    a   AA    8
>
> 3    3    a   EE    8
>
> 4    4    b   BB    9
>
> 5    5    b   FF    5
>
> 6    6    b   CC    9
>
>
>
> Thanks a lot for your help.
>
> Have a nice day
>
> Marine
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list