[R] help with merging two dataframes function of "egrep"-like formulas
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Thu Jul 19 04:51:27 CEST 2018
The traditional (SQL) way to attack this problem is to make the data
structure simpler so that faster comparisons can be utilized:
################
A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
Bx <- ( B
%>% mutate( z_B = as.character( z ) )
%>% rename( t_B = t )
%>% separate_rows( z, sep="::" )
)
Bx
#> z t_B z_B
#> 1 a*b 1 a*b::x*y
#> 2 x*y 1 a*b::x*y
#> 3 c 2 c
#> 4 3
#> 5 g*h 4 g*h
result <- ( A
%>% mutate( z = as.character( z ) )
%>% rename( t_A = t )
%>% inner_join( Bx, by="z" )
)
result
#> z t_A t_B z_B
#> 1 a*b 1 1 a*b::x*y
#' Created on 2018-07-18 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
################
Note that this is preferable if you can avoid ever creating the
complex data z in B, but Bx is much more flexible and less error prone
than B. (Especially if you don't have to create B$z_B at all, but have
some other unique identifier(s) for the groupings represented by each row
in B.)
On Wed, 18 Jul 2018, Bogdan Tanasa wrote:
> Thanks a lot ! It looks that I am getting the same results with :
>
> B %>% regex_left_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <rileyfinn3 using gmail.com> wrote:
>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>> function of the criteria :
>>> if "the elements in the 1st column of A could be found among the elements
>>> of the 1st column of B" i.e.
>>> for the example above, we shall combine in the results only the row with
>>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>>
>> This may be what you are looking for:
>>
>> library(fuzzyjoin)
>>
>> The inner join returns just the one row where the string matches.
>> B %>%
>> regex_inner_join(A, by = c(z = 'z'))
>>
>> While the full join returns NA's where the string does not match.
>> B %>%
>> regex_full_join(A, by = c(z = 'z'))
>>
>> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <tanasa using gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> please may I ask for a piece of advise regarding merging two dataframes :
>>>
>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>
>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>
>>> function of the criteria :
>>>
>>> if "the elements in the 1st column of A could be found among the elements
>>> of the 1st column of B" i.e.
>>>
>>> for the example above, we shall combine in the results only the row with
>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>
>>> thank you,
>>>
>>> bogdan
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list