[R] Matched pairs with two data frames

David Winsemius dwinsemius at comcast.net
Fri Apr 18 15:29:16 CEST 2008

Udo <ukoenig at med.uni-marburg.de> wrote in
news:1208462659.4807ad43cea9d at webmail.med.uni-marburg.de: 

> Daniel,
> thank you!
> I want to perfrom the simplest way of matching:
> a one-to-one exact match (by age and school):
> for every case in "treat" find ONE case (if there is one) in
> "control" . The cases in "control" that could be matched, should be
> tagged as not available or taken away (deleted) from the control
> pool (thus, the used ones are not replaced).
> #treatment group
> treat <- data.frame(age=c(1,1,2,2,2,4),
>                     school=c(10,10,20,20,20,11),
>                     out1=c(9.5,2.3,3.3,4.1,5.9,4.6))
> #control group
> control <- data.frame(age=c(1,1,1,1,3,2),
>                       school=c(10,10,10,10,33,20),
>                       out2=c(1.1,2,3.5,4.9,5.2,6.5))
> #one-to-one exat matching-alorithmus ????
> matched.data.frame <- ?????
> In my example I matched the cases "by hand" to make things clear.
> Case 1 from "treat" was matched with case 1 from "control",
> 2 with 2 and 3 with 6. Case 4, 5 and 6 could not be matched,
> because there is no "partner" in "control" .
> Thus my matched example data frame has 3 cases.

Is it really the case that SPSS would give the output that you describe 
without any warnings about non-uniqueness? How could they live with 
themselves after such arbitrary behavior? This link is evidence that 
SPSS may not behave as you allege.

If you really want to persist in what cannot possibly be called "one-
to-one exact matching", but instead "arbitrary convenience matching", 
then you need to construct a function that sequentially marches through 
"treat", grabs the first match (perhaps with something like):

> matched.first <- merge(treat[1,],control, by= c("age","school"))[1,]
> matched.first
  age school out1 out2
1   1     10  9.5  1.1

... except that the "1"'s would be replaced with an index variable, 
then mark that control as "taken" perhaps by using all of the variables 
as identifiers, and then attempt match/marking for each successive case 
among ("taken" == FALSE") controls.

David Winsemius

More information about the R-help mailing list