[R] An R vs. SAS Discrepancy: How do I determine which is correct?
Kevin E. Thorpe
kevin.thorpe at utoronto.ca
Tue Dec 1 20:59:02 CET 2009
Thanks to an insightful comment from Jeremy Miles, who politely
pointed out my thick-headed moment, I know what happened.
The sex variable was coded as 1/2 in the SAS data, but was a factor
in the R data and so became a properly coded dummy variable.
Sorry for the obvious question and answer.
Kevin E. Thorpe wrote:
> I was messing around with some data in R and SAS (the reason is
> unimportant) fitting a multiple linear regression and got a
> curious discrepancy. The data set is too big to post, but if
> someone wants it, I can send it.
>
> So, here are the (partial) results:
>
> From R:
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 61.11434 1.48065 41.275 < 2e-16 ***
> sexWomen 2.91108 0.35753 8.142 5e-16 ***
> diabp 0.20675 0.01504 13.746 < 2e-16 ***
> age -0.08085 0.02088 -3.871 0.000110 ***
>
> From SAS (sorry about word-wrap if it happens):
>
> Parameter Estimates
>
> Parameter Standard
> Variable Label DF Estimate Error
> t Value
>
> Intercept Intercept 1 58.20326 1.57802
> 36.88
> SEX SEX 1 2.91108 0.35753
> 8.14
> DIABP Diastolic BP mmHg 1 0.20675 0.01504
> 13.75
> AGE Age (years) at examination 1 -0.08085 0.02088
> -3.87
>
> Parameter Estimates
>
> Variable Label DF Pr > |t|
>
> Intercept Intercept 1 <.0001
> SEX SEX 1 <.0001
> DIABP Diastolic BP mmHg 1 <.0001
> AGE Age (years) at examination 1 0.0001
>
> The curious thihs is that all parameter estimates agree except the
> intercept. In R I also computed the coefficients directly using
> (X'X)^(-1) X' y and get the same coefficients as lm() have me.
> Also, ols() in Design agrees with lm()
>
> As far as I can tell, the data used in R and SAS are identical. So,
> whose answer is correct and how do I prove it? Here's my sessionInfo
> (yes, I know my version of R is oldish).
>
> > sessionInfo()
> R version 2.8.0 (2008-10-20)
> i686-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
>
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] Design_2.2-0 survival_2.35-4 Hmisc_3.6-0 lattice_0.17-25
>
> loaded via a namespace (and not attached):
> [1] cluster_1.12.0 grid_2.8.0
>
--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
More information about the R-help
mailing list