[R] An R vs. SAS Discrepancy: How do I determine which is correct?

Kevin E. Thorpe kevin.thorpe at utoronto.ca
Tue Dec 1 20:59:02 CET 2009


Thanks to an insightful comment from Jeremy Miles, who politely
pointed out my thick-headed moment, I know what happened.

The sex variable was coded as 1/2 in the SAS data, but was a factor
in the R data and so became a properly coded dummy variable.

Sorry for the obvious question and answer.

Kevin E. Thorpe wrote:
> I was messing around with some data in R and SAS (the reason is
> unimportant) fitting a multiple linear regression and got a
> curious discrepancy.  The data set is too big to post, but if
> someone wants it, I can send it.
> 
> So, here are the (partial) results:
> 
>  From R:
> 
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept) 61.11434    1.48065  41.275  < 2e-16 ***
> sexWomen     2.91108    0.35753   8.142    5e-16 ***
> diabp        0.20675    0.01504  13.746  < 2e-16 ***
> age         -0.08085    0.02088  -3.871 0.000110 ***
> 
>  From SAS (sorry about word-wrap if it happens):
> 
>                               Parameter Estimates
> 
>                                                 Parameter     Standard
>  Variable   Label                         DF     Estimate        Error 
>  t Value
> 
>  Intercept  Intercept                      1     58.20326      1.57802 
>    36.88
>  SEX        SEX                            1      2.91108      0.35753 
>     8.14
>  DIABP      Diastolic BP mmHg              1      0.20675      0.01504 
>    13.75
>  AGE        Age (years) at examination     1     -0.08085      0.02088 
>    -3.87
> 
>                               Parameter Estimates
> 
>              Variable   Label                         DF  Pr > |t|
> 
>              Intercept  Intercept                      1    <.0001
>              SEX        SEX                            1    <.0001
>              DIABP      Diastolic BP mmHg              1    <.0001
>              AGE        Age (years) at examination     1    0.0001
> 
> The curious thihs is that all parameter estimates agree except the
> intercept.  In R I also computed the coefficients directly using
> (X'X)^(-1) X' y and get the same coefficients as lm() have me.
> Also, ols() in Design agrees with lm()
> 
> As far as I can tell, the data used in R and SAS are identical.  So,
> whose answer is correct and how do I prove it?  Here's my sessionInfo
> (yes, I know my version of R is oldish).
> 
>  > sessionInfo()
> R version 2.8.0 (2008-10-20)
> i686-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C 
> 
> 
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] Design_2.2-0    survival_2.35-4 Hmisc_3.6-0     lattice_0.17-25
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.12.0 grid_2.8.0
> 


-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016




More information about the R-help mailing list