[R] combining unequal dataframes based on a common grouping factor

Chel Hee Lee chl948 at mail.usask.ca
Thu Dec 4 03:17:48 CET 2014


 > frame1
   ID GROUP PROP_AREA
1  1     A      0.33
2  2     A      0.33
3  3     A      0.33
4  4     B      0.50
5  5     B      0.50
6  6     C      1.00
7  7     D      1.00
 > frame2
   GROUP VALUE1 VALUE2
1     A     10      5
2     B     20     10
3     C     30     15
4     D     40     20
 >
 > obj1 <- merge(x=frame1, y=frame2, by="GROUP")
 > obj1$rval1 <- obj1$PROP_AREA * obj1$VALUE1
 > obj1$rval2 <- obj1$PROP_AREA * obj1$VALUE2
 > obj1
   GROUP ID PROP_AREA VALUE1 VALUE2 rval1 rval2
1     A  1      0.33     10      5   3.3  1.65
2     A  2      0.33     10      5   3.3  1.65
3     A  3      0.33     10      5   3.3  1.65
4     B  4      0.50     20     10  10.0  5.00
5     B  5      0.50     20     10  10.0  5.00
6     C  6      1.00     30     15  30.0 15.00
7     D  7      1.00     40     20  40.0 20.00
 >
 > idx <- match(x=frame1$GROUP, table=frame2$GROUP)
 > rval1 <- frame1["PROP_AREA"] * frame2[idx, "VALUE1"]
 > rval2 <- frame1["PROP_AREA"] * frame2[idx, "VALUE2"]
 > cbind("ID"=frame1[idx, "ID"], rval1, rval2)
   ID PROP_AREA PROP_AREA
1  1       3.3      1.65
2  1       3.3      1.65
3  1       3.3      1.65
4  2      10.0      5.00
5  2      10.0      5.00
6  3      30.0     15.00
7  4      40.0     20.00
 >

Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 12/03/2014 03:14 PM, Brock Huntsman wrote:
> I apologize if this is a relatively easy problem, but have been stuck on
> this issue for a few days. I am attempting to combine values from 2
> separate dataframes. Each dataframe contains a shared identifier (GROUP).
> Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping
> factor into unique identifiers (ID), as well as contains the proportion of
> the GROUP area of which the unique identifier consists (PROP_AREA).
> Dataframe 2 (291 x 14976) in addition to consisting of the shared
> identifier, also has numerous columns consisting of values (VALUE1,
> VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each
> value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor,
> constructing a final dataframe of size 3272 x 14976. An example of the data
> frames are as follows:
>
>
> frame1:
>
> ID
>
> GROUP
>
> PROP_AREA
>
> 1
>
> A
>
> 0.33
>
> 2
>
> A
>
> 0.33
>
> 3
>
> A
>
> 0.33
>
> 4
>
> B
>
> 0.50
>
> 5
>
> B
>
> 0.50
>
> 6
>
> C
>
> 1.00
>
> 7
>
> D
>
> 1.00
>
>
>
> frame2:
>
> GROUP
>
> VALUE1
>
> VALUE2
>
> A
>
> 10
>
> 5
>
> B
>
> 20
>
> 10
>
> C
>
> 30
>
> 15
>
> D
>
> 40
>
> 20
>
>
>
>   Desired dataframe
>
> frame3:
>
> ID
>
> VALUE1
>
> VALUE2
>
> 1
>
> 3.3
>
> 1.65
>
> 2
>
> 3.3
>
> 1.65
>
> 3
>
> 3.3
>
> 1.65
>
> 4
>
> 10
>
> 5
>
> 5
>
> 10
>
> 5
>
> 6
>
> 30
>
> 15
>
> 7
>
> 40
>
> 20
>
>
>
>
>
> I assume I would need to use the %in% function or if statements, but am
> unsure how to write the code. I have attempted to construct a for loop with
> an if statement, but have not been successful as of yet.
>
>
> for(i in 1:nrow(frame1)) {
>
>    for(j in 2:ncol(frame2)) {
>
>      if (frame1$GROUP[i] == frame2$GROUP[i]) {
>
>        frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1]
>
>      }
>
>    }
>
> }
>
>
> Any advice on suggested code or packages to read up on would be much
> appreciated.
>
> Brock
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list