[R] Matching values between 2 data.frame.

Bert Gunter bgunter.4567 at gmail.com
Sat May 20 22:13:37 CEST 2017


Like this?  (use indexing to avoid explicit loops whenever possible):

## first convert factor columns to character, as David W. suggested
i <- sapply(od,is.factor)
od[i]<- lapply(od[i],as.character)
i <- sapply(tv, is.factor)
tv[i]<- lapply(tv[i],as.character)


## Now use ?match
wh  <-   match(od[,"AA1"], tv[,"AA"])
matched <- !is.na(wh)  ## only needed if not all AA1's match in AA
od[matched,c("AA1", "Value")] <- tv[wh[matched],c("BB","Value")]

> od
   Value1 Value2 AA1 Value
1       1     11   B     7
2       2     12   B    25
3       3     13   B     7
4       4     14   B     5
5       5     15   B    10
6       6     16   B     5
7       7     17  CC    21
8       8     18  CC    21
9       9     19   B    25
10     10     20   B    25
11     11     21   B     7
12     12     22   B    25


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, May 20, 2017 at 11:53 AM, David Winsemius
<dwinsemius at comcast.net> wrote:
>
>> On May 20, 2017, at 11:23 AM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:
>>
>> Hi again,
>>
>> Let say I have below 2 data frames.
>>
>> OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' =
>> c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
>> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
>>
>> TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5',
>> 'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10,
>> 25, 7, 35, 21))
>>
>> OriginalData
>> TargetValue
>>
>> Now I need to replace OriginalData's 'AA1' column with TargetValue's
>> 'BB' column, based on matched values between 'AA1' & 'AA' columns of
>> OriginalData & TargetValue respectively. With this same law, I need to
>> update 'Value' column of  OriginalData with that of TargetValue.
>>
>> As an example, after replacement by above rule, 1st row of
>> OriginalData should look like :
>>
>>> OriginalData
>>
>>   Value1 Value2 AA1 Value
>>
>> 1       1     11 B    7
>>
>> Values of TargetValue's 'AA' column are unique i.e. no duplication
>>
>> Previously I have implemented a 'for' loop to implement above, however
>> since both of my data.frames are quite big, it is taking long time to
>> execute. Is there any 'R' way to implement this quickly.
>>
>> Appreciate for any pointer.
>
> It's going to have a greater chance of delivering the desired result if you convert the factor columns into character.
>
>>
>> Thanks,
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list