[R] Problems with weight

Tue Nov 27 22:54:48 CET 2012

Le mardi 27 novembre 2012 à 18:33 -0300, Pablo Menese a écrit :
> I can't ... I don't know why but I can't
> 
> When I use it:
> 
> logit <- glm(bach ~ egp4 + programa, weight=wst7,
> family=quasibinomial(link"logit"))
You were advised to use svyglm(), not glm(). It's usually considered
polite to read carefully the anwsers you get to your questions...


Regards

> I reach the same betas that in STATA, but the hypothesis test, the t value,
> and the std. error is different.
> 
> I think that the solution can't be so far from this...
> 
> 
> On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico <ajdamico at gmail.com> wrote:
> 
> > from your stata output, it looks like you need to use the survey package
> > in R
> >
> > for step-by-step instructions about how to do this (and comparisons to
> > stata), see
> >
> > http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
> >
> > once you're ready to run the regression, use svyglm() instead of glm() and
> > drop the weights argument (since it will already be part of the survey
> > design)   :)
> >
> >
> >
> > On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese <pmenese at gmail.com> wrote:
> >
> >> Until a weeks ago I used stata for everything.
> >> Now I'm learning R and trying to move. But, in this stage I'm testing R
> >> trying to do the same things than I used to do in stata whit the same
> >> outputs.
> >> I have a problem with the logit, applying weights.
> >>
> >> in stata I have this output
> >> . svy: logit bach job2 mujer i.egp4 programa delay mdeo i.str evprivate
> >> (running logit on estimation sample)
> >>
> >> Survey: Logistic regression
> >>
> >> Number of strata   =         1                  Number of obs      =
> >> 248
> >> Number of PSUs     =       248                  Population size    =
> >> 5290.1639
> >> Design df          =       247
> >> F(  11,    237)    =      4.39
> >> Prob > F           =    0.0000
> >>
> >>
> >> Linearized
> >> bach       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]
> >>
> >> job2   -.4437446   .4385934    -1.01   0.313    -1.307605    .4201154
> >> mujer    1.070595   .4169919     2.57   0.011     .2492812    1.891908
> >>
> >> egp4
> >> 2    -.4839342    .539808    -0.90   0.371    -1.547148    .5792796
> >> 3    -1.288947   .5347344    -2.41   0.017    -2.342168   -.2357263
> >> 4    -.8569793   .5106425    -1.68   0.095    -1.862748    .1487898
> >>
> >> programa    .9694352   .5677642     1.71   0.089    -.1488415    2.087712
> >> delay   -1.552582   .5714967    -2.72   0.007    -2.678211    -.426954
> >> mdeo   -.7938904   .3727571    -2.13   0.034    -1.528078   -.0597025
> >>
> >> str
> >> 2    -1.122691   .5731879    -1.96   0.051     -2.25165    .0062682
> >> 3    -2.056682   .6350485    -3.24   0.001    -3.307483   -.8058812
> >>
> >> evprivate   -1.962431   .5674143    -3.46   0.001    -3.080018   -.8448431
> >> _cons    2.308699   .7274924     3.17   0.002     .8758187    3.741578
> >>
> >>
> >> the best that i get in R was:
> >>
> >> glm(formula = bach ~ job2 + mujer + egp4 + programa + delay +
> >>     mdeo + str + evprivate, family = quasibinomial(link = "logit"),
> >>     weights = wst7)
> >>
> >> Deviance Residuals:
> >>      Min        1Q    Median        3Q       Max
> >> -12.5951   -3.9034   -0.9412    3.8268   11.2750
> >>
> >> Coefficients:
> >>                            Estimate Std. Error t value Pr(>|t|)
> >> (Intercept)                  2.3087     0.7173   3.218  0.00147 **
> >> job2                        -0.4437     0.4355  -1.019  0.30926
> >> mujer                        1.0706     0.3558   3.009  0.00290 **
> >> egp4intermediate (iii, iv)  -0.4839     0.4946  -0.978  0.32890
> >> egp4skilled manual workers  -1.2889     0.5268  -2.447  0.01514 *
> >> egp4working class           -0.8570     0.4625  -1.853  0.06514 .
> >> programa                     0.9694     0.4951   1.958  0.05141 .
> >> delay                       -1.5526     0.4878  -3.183  0.00166 **
> >> mdeo                        -0.7939     0.4207  -1.887  0.06037 .
> >> strest. ii                  -1.1227     0.4809  -2.334  0.02042 *
> >> strestr. iii                -2.0567     0.5134  -4.006 8.28e-05 ***
> >> evprivate                   -1.9624     0.6490  -3.024  0.00277 **
> >> ---
> >> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> >>
> >> (Dispersion parameter for quasibinomial family taken to be 23.14436)
> >>
> >>     Null deviance: 7318.5  on 246  degrees of freedom
> >> Residual deviance: 5692.8  on 235  degrees of freedom
> >>   (103 observations deleted due to missingness)
> >> AIC: NA
> >>
> >> Number of Fisher Scoring iterations: 6
> >>
> >> Warning message:
> >> In summary.glm(logit) :
> >>   observations with zero weight not used for calculating dispersion
> >>
> >> this has the same betas but the hypothesis test has differents values...
> >>
> >>
> >> HELP!!!!
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.