[R] Matched pairs with two data frames
David Winsemius
dwinsemius at comcast.net
Fri Apr 18 15:29:16 CEST 2008
Udo <ukoenig at med.uni-marburg.de> wrote in
news:1208462659.4807ad43cea9d at webmail.med.uni-marburg.de:
> Daniel,
> thank you!
>
> I want to perfrom the simplest way of matching:
> a one-to-one exact match (by age and school):
> for every case in "treat" find ONE case (if there is one) in
> "control" . The cases in "control" that could be matched, should be
> tagged as not available or taken away (deleted) from the control
> pool (thus, the used ones are not replaced).
>
> #treatment group
> treat <- data.frame(age=c(1,1,2,2,2,4),
> school=c(10,10,20,20,20,11),
> out1=c(9.5,2.3,3.3,4.1,5.9,4.6))
>
> #control group
> control <- data.frame(age=c(1,1,1,1,3,2),
> school=c(10,10,10,10,33,20),
> out2=c(1.1,2,3.5,4.9,5.2,6.5))
>
> #one-to-one exat matching-alorithmus ????
>
> matched.data.frame <- ?????
>
> In my example I matched the cases "by hand" to make things clear.
> Case 1 from "treat" was matched with case 1 from "control",
> 2 with 2 and 3 with 6. Case 4, 5 and 6 could not be matched,
> because there is no "partner" in "control" .
> Thus my matched example data frame has 3 cases.
Is it really the case that SPSS would give the output that you describe
without any warnings about non-uniqueness? How could they live with
themselves after such arbitrary behavior? This link is evidence that
SPSS may not behave as you allege.
<http://kb.iu.edu/data/afit.html>
If you really want to persist in what cannot possibly be called "one-
to-one exact matching", but instead "arbitrary convenience matching",
then you need to construct a function that sequentially marches through
"treat", grabs the first match (perhaps with something like):
> matched.first <- merge(treat[1,],control, by= c("age","school"))[1,]
> matched.first
age school out1 out2
1 1 10 9.5 1.1
... except that the "1"'s would be replaced with an index variable,
then mark that control as "taken" perhaps by using all of the variables
as identifiers, and then attempt match/marking for each successive case
among ("taken" == FALSE") controls.
--
David Winsemius
More information about the R-help
mailing list