[R] Propensity score matching with MatchIt
Suparna Mitra
suparna.mitra.sm at gmail.com
Mon Jan 11 11:00:33 CET 2016
Hello R experts,
I am trying to do Propensity score matching for a medical data with two
types of surgery.
But somehow I am getting Summary of balance for all data and the matched
data exactly similar resulting the Percent Balance Improvement as zero.
> surgery.data<-read.csv(file.choose(), head = TRUE)
> surgery.data
Sample Surgerytype Age ASAgrade BMI FIGOstage PreviousAbdoSurgery
1 2 1 41 1 22.3 3 0
2 4 1 49 2 19.5 3 0
3 5 1 58 2 28.8 3 0
4 8 1 34 1 29.1 3 0
5 9 1 49 1 25.1 3 0
6 13 1 30 2 29.0 3 0
7 14 1 31 1 23.6 3 0
8 15 1 29 1 33.7 3 2
9 20 1 25 1 24.6 3 0
10 28 1 28 1 21.0 3 0
11 29 1 29 2 21.4 3 0
12 30 1 61 1 25.2 3 3
13 32 1 48 1 22.7 3 0
14 33 1 24 1 26.1 3 3
15 34 1 39 1 23.7 3 0
16 36 1 39 2 34.6 3 1
17 37 1 68 2 27.0 3 0
18 49 1 71 2 30.8 3 3
19 50 1 73 2 25.8 3 0
20 54 1 30 2 23.1 3 0
21 65 1 45 2 34.6 3 0
22 77 1 41 1 29.8 3 3
23 82 1 41 2 33.8 3 0
24 86 1 34 1 34.7 3 0
25 87 1 28 2 21.4 3 0
26 88 1 35 1 25.5 3 2
27 89 1 46 1 31.9 3 1
28 91 1 48 2 20.7 3 0
29 92 1 28 2 22.4 3 2
30 96 1 45 1 22.7 3 1
31 97 1 39 2 19.7 3 1
32 98 1 34 1 27.6 3 2
33 101 1 41 1 22.5 3 0
34 107 1 31 2 31.0 3 0
35 113 1 51 2 33.2 3 0
36 114 1 43 2 22.5 3 2
37 6 0 50 1 22.9 3 0
38 7 0 43 2 25.6 3 0
39 11 0 43 1 23.8 3 2
40 12 0 31 1 22.0 3 0
41 16 0 31 1 27.2 3 2
42 17 0 34 1 19.6 3 0
43 18 0 56 3 25.2 3 0
44 21 0 39 1 26.6 3 0
45 25 0 64 2 24.5 3 0
46 45 0 61 1 21.9 3 0
47 47 0 64 1 28.5 3 0
48 53 0 54 2 26.8 5 0
49 55 0 40 1 23.1 3 0
50 57 0 46 1 26.2 3 3
51 59 0 34 1 21.5 3 0
52 62 0 25 2 23.8 3 0
53 63 0 56 2 24.6 3 0
54 64 0 45 1 24.2 3 0
55 66 0 42 1 30.4 3 0
56 67 0 49 2 35.8 2 0
57 69 0 63 1 24.7 3 0
58 70 0 29 1 29.7 5 0
59 71 0 39 1 19.9 3 3
60 73 0 62 1 28.0 3 0
61 74 0 24 1 26.7 3 0
62 75 0 70 2 31.2 3 4
63 76 0 42 2 23.0 3 0
64 79 0 56 1 34.9 3 0
65 81 0 40 1 25.0 3 0
66 83 0 39 2 29.6 3 4
67 84 0 58 1 22.1 1 0
68 104 0 36 1 28.6 3 0
69 105 0 37 1 31.2 3 0
70 109 0 33 1 25.0 3 0
71 110 0 37 1 25.8 3 0
72 111 0 34 1 21.0 3 2
> m.out1 <- matchit(Surgerytype ~ Age + ASAgrade + BMI + FIGOstage +
PreviousAbdoSurgery, data = surgery.data, method = "nearest", distance =
"logit")
> summary(m.out1) # check balance
Call:
matchit(formula = Surgerytype ~ Age + ASAgrade + BMI + FIGOstage +
PreviousAbdoSurgery, data = surgery.data, method = "nearest",
distance = "logit")
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ
Med eQQ Mean eQQ Max
distance 0.5426 0.4574 0.1429 0.0853
0.0913 0.0867 0.1686
Age 41.2778 44.6111 12.2528 -3.3333
4.0000 4.1111 10.0000
ASAgrade 1.5000 1.3056 0.5248 0.1944
0.0000 0.2500 1.0000
BMI 26.4194 25.8500 3.8345 0.5694
0.8500 1.1472 3.5000
FIGOstage 3.0000 3.0278 0.6088 -0.0278
0.0000 0.1944 2.0000
PreviousAbdoSurgery 0.7222 0.5556 1.2058 0.1667
0.0000 0.2778 2.0000
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ
Med eQQ Mean eQQ Max
distance 0.5426 0.4574 0.1429 0.0853
0.0913 0.0867 0.1686
Age 41.2778 44.6111 12.2528 -3.3333
4.0000 4.1111 10.0000
ASAgrade 1.5000 1.3056 0.5248 0.1944
0.0000 0.2500 1.0000
BMI 26.4194 25.8500 3.8345 0.5694
0.8500 1.1472 3.5000
FIGOstage 3.0000 3.0278 0.6088 -0.0278
0.0000 0.1944 2.0000
PreviousAbdoSurgery 0.7222 0.5556 1.2058 0.1667
0.0000 0.2778 2.0000
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 0 0 0 0
Age 0 0 0 0
ASAgrade 0 0 0 0
BMI 0 0 0 0
FIGOstage 0 0 0 0
PreviousAbdoSurgery 0 0 0 0
Sample sizes:
Control Treated
All 36 36
Matched 36 36
Unmatched 0 0
Discarded 0 0
But if I test separately for Age or BMI, I know there are differences in
these two groups. As results shows here:
> summary(lm(Age~Surgerytype,data=surgery.data))
Call:
lm(formula = Age ~ Surgerytype, data = surgery.data)
Residuals:
Min 1Q Median 3Q Max
-20.611 -10.361 -2.278 7.722 31.722
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.944 4.663 8.138 1.02e-11 ***
Surgerytype 3.333 2.949 1.130 0.262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.51 on 70 degrees of freedom
Multiple R-squared: 0.01792, Adjusted R-squared: 0.003895
F-statistic: 1.278 on 1 and 70 DF, p-value: 0.2622
######
> summary(lm(BMI~Surgerytype,data=surgery.data))
Call:
lm(formula = BMI ~ Surgerytype, data = surgery.data)
Residuals:
Min 1Q Median 3Q Max
-6.919 -3.719 -0.850 2.698 9.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.9889 1.6089 16.77 <2e-16 ***
Surgerytype -0.5694 1.0176 -0.56 0.578
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.317 on 70 degrees of freedom
Multiple R-squared: 0.004454, Adjusted R-squared: -0.009768
F-statistic: 0.3132 on 1 and 70 DF, p-value: 0.5775
##Or a t-test for Age
> t.test(surgery.data$Age[surgery.data $Surgerytype ==1], surgery.data
$Age[surgery.data $Surgerytype ==2],paired=FALSE)
Welch Two Sample t-test
data: surgery.data$Age[surgery.data$Surgerytype == 1] and
surgery.data$Age[surgery.data$Surgerytype == 2]
t = -1.1303, df = 69.883, p-value = 0.2622
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.215118 2.548451
sample estimates:
mean of x mean of y
41.27778 44.61111
======
May be I am doing a silly mistake. Can anybody please help me?
Thanks a lot,
Mitra
[[alternative HTML version deleted]]
More information about the R-help
mailing list