[R] merge dataframes with conditions formulated as logical expressions
Wolfram Fischer
wolfram at fischer-zim.ch
Thu Jun 15 07:06:09 CEST 2006
--- Reply to: ---
>Date: 14.06.06 16:17 (+0000)
>From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
>Subject: Re: [R] merge dataframes with conditions formulated as logical expressions
>
> You have discontinuity between your MIN.VAL and MAX.VAL for a given
> group. If this is true in practise, then you may want to check and
> report when VAL is in the discontinuous region.
Your solution without concerning discontinuity is better
because it is more general.
> Here is my solution that ignores that (and only uses MIN.VAL and
> completely disrespecting MAX.VAL). Not very elegant but should do
> the trick.
>
>
> df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) )
> dp <- data.frame( GRP=c( "A", "A", "B", "B" ), MIN.VAL=c( 1, 50, 1,
> 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
>
> x <- split(df, df$GRP)
> y <- split(dp, dp$GRP)
>
> out <- NULL
> for(g in names(x)){
>
> xx <- x[[g]]
> yy <- y[[g]]
>
> w <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
> tmp <- cbind(xx, yy[w, "VAL2"])
> colnames(tmp) <- c("GRP", "VAL", "VAL2")
> out <- rbind(out, tmp)
> }
> out
>
> Regards, Adai
Thanks for this solution.
I did not yet try to program a conventional solution
because I thought there would be a nice shortcut in R
to solve the problem comparably elegantly as in SQL:
select df.*, dp.VAL2
from df, dp
where df.GRP = dp.GRP
and df.VAL > dp.MIN_VAL
and df.VAL <= dp.MAX_VAL
Wolfram
> On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
> > I have a data.frame df containing two variables:
> > GRP: Factor
> > VAL: num
> >
> > I have a data.frame dp containing:
> > GRP: Factor
> > MIN.VAL: num
> > MAX.VAL: num
> > VAL2: num
> > with several rows per "GRP"
> > where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"]
> > within the same "GRP".
> >
> > I want to create df[i, "VAL2"] <- dpp[z, "VAL2"]
> > with i along df
> > and dpp <- subset( dp, GRP = df[i, "GRP"] )
> > so that it is true for each i:
> > df[i, "VAL"] > dpp[z, "MIN.VAL"]
> > and df[i, "VAL"] <= dpp[z, "MAX.VAL"]
> >
> > Is there an easy/efficient way to do that?
> >
> > Example:
> > df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) )
> > dp <- data.frame( GRP=c( "A", "A", "B", "B" ),
> > MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ),
> > VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> >
> > The result should be:
> > df$VAL2 <- c( 1.1, 2.2, 4.4 )
> >
> > Thanks - Wolfram
> >
> > ______________________________________________
More information about the R-help
mailing list