[R] Precision in R
adelmaas@musc.edu
adelmaas at musc.edu
Thu Jul 22 16:02:28 CEST 2004
On 22 Jul, at 06:09, r-help-request at stat.math.ethz.ch wrote:
> Message: 5
> Date: Wed, 21 Jul 2004 13:48:53 +0200
> From: bhx2 at mevik.net ( Bj?rn-Helge Mevik )
> Subject: Re: [R] Precision in R
> To: r-help at stat.math.ethz.ch
> Message-ID: <m0llhdbmxm.fsf at bar.nemo-project.org>
> Content-Type: text/plain; charset=iso-8859-1
>
> Since you didn't say anything about _what_ you did, either in SAS or
> R, my first thought was: Have you checked that you use the same
> parametrization of the models in R and SAS?
Well, I'm running Poisson regressions for the incidence of childhood
acute lymphoblastic leukemia in a set of US counties (and in this data
set, for some reason, Hawaii counts as an entire county). Separate
models are calculated for males and females. Independent variable of
interest are race ("white", "black", "other") and (in the model for
males only) -log(proportion of people in county who moved between 1985
and 1990) (AKA "minus log proportion moved" or "MLPM").
SAS code:
> title "Males";
> proc genmod data=males order=formatted;
> class race sex;
> model observed = race mlpm*mlpm*mlpm mlpm*mlpm mlpm /
> dist=poisson link=log offset=lPYAR covb;
>
> run;
>
> title "Females";
> proc genmod data=females order=formatted;
> class race sex;
> model observed = race / dist=poisson link=log offset=lPYAR;
> run;
R code:
> Female.model <- glm(Observed ~ Black + Other, family =
> poisson(link=log), offset=log(PYAR), data=Females)
>
> Male.model <- glm(Observed ~ Black + Other +
> I(Minus.log.proportion.moved^3) + I(Minus.log.proportion.moved^2) +
> Minus.log.proportion.moved, family = poisson(link=log),
> offset=log(PYAR), data=Males)
The difference in how race is included in the models is due to me
wanting both programs to use "whites" as the referent group (seeing as
I have more data from them than "blacks" and "others").
SAS results:
> Males 12:08
> Wednesday, April 21, 2004 173
>
> The GENMOD Procedure
>
> Model Information
>
> Data Set WORK.MALES
> Distribution Poisson
> Link Function Log
> Dependent Variable Observed
> Offset Variable lPYAR
> Observations Used 526
>
>
> Class Level Information
>
> Class Levels Values
>
> Race 3 B O W
> Sex 1 M
>
>
> Parameter Information
>
> Parameter Effect Race
>
> Prm1 Intercept
> Prm2 Race B
> Prm3 Race O
> Prm4 Race W
> Prm5 mlPM*mlPM*mlPM
> Prm6 mlPM*mlPM
> Prm7 mlPM
>
>
> Criteria For Assessing Goodness Of Fit
>
> Criterion DF Value
> Value/DF
>
> Deviance 520 239.5025
> 0.4606
> Scaled Deviance 520 239.5025
> 0.4606
> Pearson Chi-Square 520 360.5677
> 0.6934
> Scaled Pearson X2 520 360.5677
> 0.6934
> Log Likelihood 320.5910
>
>
> Males 12:08
> Wednesday, April 21, 2004 174
>
> The GENMOD Procedure
>
> Algorithm converged.
>
>
> Estimated Covariance Matrix
>
> Prm1 Prm2 Prm3 Prm5
> Prm6 Prm7
>
> Prm1 9.25071 -0.01841 0.04877 -13.71192
> 37.88798 -33.20414
> Prm2 -0.01841 0.03392 0.002521 0.03045
> -0.07720 0.06191
> Prm3 0.04877 0.002521 0.02027 -0.07622
> 0.21457 -0.18748
> Prm5 -13.71192 0.03045 -0.07622 22.11044
> -59.26190 50.49281
> Prm6 37.88798 -0.07720 0.21457 -59.26190
> 160.70 -138.32
> Prm7 -33.20414 0.06191 -0.18748 50.49281
> -138.32 120.18
>
>
> Analysis Of Parameter Estimates
>
> Standard Wald 95% Confidence
> Chi-
> Parameter DF Estimate Error Limits
> Square Pr > ChiSq
>
> Intercept 1 -15.8294 3.0415 -21.7907 -9.8682
> 27.09 <.0001
> Race B 1 -0.6646 0.1842 -1.0256 -0.3036
> 13.02 0.0003
> Race O 1 -0.1058 0.1424 -0.3848 0.1733
> 0.55 0.4575
> Race W 0 0.0000 0.0000 0.0000 0.0000
> . .
> mlPM*mlPM*mlPM 1 15.4205 4.7022 6.2044 24.6366
> 10.75 0.0010
> mlPM*mlPM 1 -36.8423 12.6768 -61.6884 -11.9961
> 8.45 0.0037
> mlPM 1 27.2989 10.9627 5.8124 48.7855
> 6.20 0.0128
> Scale 0 1.0000 0.0000 1.0000 1.0000
>
> NOTE: The scale parameter was held fixed.
>
>
> Females 12:08
> Wednesday, April 21, 2004 175
>
> The GENMOD Procedure
>
> Model Information
>
> Data Set WORK.FEMALES
> Distribution Poisson
> Link Function Log
> Dependent Variable Observed
> Offset Variable lPYAR
> Observations Used 534
>
>
> Class Level Information
>
> Class Levels Values
>
> Race 3 B O W
> Sex 1 F
>
>
> Criteria For Assessing Goodness Of Fit
>
> Criterion DF Value
> Value/DF
>
> Deviance 531 245.2305
> 0.4618
> Scaled Deviance 531 245.2305
> 0.4618
> Pearson Chi-Square 531 484.8219
> 0.9130
> Scaled Pearson X2 531 484.8219
> 0.9130
> Log Likelihood 183.8640
>
>
> Algorithm converged.
>
>
> Analysis Of Parameter Estimates
>
> Standard Wald 95% Confidence
> Chi-
> Parameter DF Estimate Error Limits
> Square Pr > ChiSq
>
> Intercept 1 -9.7630 0.0577 -9.8762 -9.6499
> 28595.0 <.0001
> Race B 1 -1.0917 0.2493 -1.5803 -0.6030
> 19.17 <.0001
> Race O 1 0.0014 0.1569 -0.3061 0.3088
> 0.00 0.9931
> Race W 0 0.0000 0.0000 0.0000 0.0000
> . .
>
>
> Females 12:08
> Wednesday, April 21, 2004 176
>
> The GENMOD Procedure
>
> Analysis Of Parameter Estimates
>
> Standard Wald 95% Confidence
> Chi-
> Parameter DF Estimate Error Limits
> Square Pr > ChiSq
>
> Scale 0 1.0000 0.0000 1.0000 1.0000
>
> NOTE: The scale parameter was held fixed.
R results:
> > summary(Female.model)
>
> Call:
> glm(formula = Observed ~ Black + Other, family = poisson(link = log),
> data = Females, offset = log(PYAR))
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -2.4060 -0.5315 -0.1109 -0.0284 2.6520
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -9.763025 0.057735 -169.101 < 2e-16 ***
> BlackTRUE -1.091679 0.249309 -4.379 1.19e-05 ***
> OtherTRUE 0.001363 0.156876 0.009 0.993
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
> Null deviance: 272.49 on 533 degrees of freedom
> Residual deviance: 245.23 on 531 degrees of freedom
> AIC: 520.71
>
> Number of Fisher Scoring iterations: 7
>
> > summary(Male.model)
>
> Call:
> glm(formula = Observed ~ Black + Other +
> I(Minus.log.proportion.moved^3) +
> I(Minus.log.proportion.moved^2) + Minus.log.proportion.moved,
> family = poisson(link = log), data = Males, offset = log(PYAR))
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -2.24568 -0.49137 -0.10197 -0.03262 3.88346
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -16.39065 3.31644 -4.942 7.72e-07
> ***
> BlackTRUE -0.66461 0.18418 -3.608 0.000308
> ***
> OtherTRUE -0.09513 0.14278 -0.666 0.505245
> I(Minus.log.proportion.moved^3) 24.39920 7.51188 3.248 0.001162
> **
> I(Minus.log.proportion.moved^2) -51.17011 17.75857 -2.881 0.003959
> **
> Minus.log.proportion.moved 33.48773 13.52491 2.476 0.013286 *
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
> Null deviance: 278.68 on 525 degrees of freedom
> Residual deviance: 240.54 on 520 degrees of freedom
> AIC: 582.68
>
> Number of Fisher Scoring iterations: 6
Now, you'll notice (after scrolling up and down a lot) that the models
for females have identical results, but the models for males have
different results. Anybody have any ideas why I'm getting a difference
and which program (if either) is giving me the right answer? Thanks in
advance again.
Aaron
-------------
Aaron Solomonâ (â¬ben Saul Josephâ) â¬Adelman
E-mailâ: â¬adelmaas at musc.edu
Web siteâ: â¬httpâ://â¬people.musc.eduâ/â¬~adelmaasâ/â¬
AOL Instant Messengerâ & â¬Yahooâ! â¬Messenger: â¬Hiergargo
AIM chat-room (preferred): Adelmania
More information about the R-help
mailing list