[R] Sata and R users GLM methods translation

Thomas Lumley tlumley at u.washington.edu
Sat Jan 23 00:53:52 CET 2010


On Fri, 22 Jan 2010, Jason Morgan wrote:

> Hello Jean-Baptiste,
>
> On 2010.01.22 16:32:53, Jean-Baptiste Combes wrote:
>> Hello,
>>
>> I am learning R and I am fluent in Stata and I try to translate part of my
>> Stata code to R to check the reliability of the data under R. I have a
>> proportion variable as a dependent variable pQSfteHT . Independent variables
>> are dummies for two categorical variables called dQSvacrateHTQuali3 and
>> cluster_3.  I am fitting a model with the Stata command below:
>>
>> glm pQSfteHT dQSvacrateHTQuali3_2 dQSvacrateHTQuali3_3 dQSvacrateHTQuali3_4
>> dQSvacrateHTQuali3_5 cluster_32 cluster_33 cluster_34, link(probit)
>> family(binomial) robust
>>
>> and the same (I expect) model with R with the command below:
>>
>> nurse.model<-glm(pQSfteHT~dQSvacrateHTQuali3_2 + dQSvacrateHTQuali3_3 +
>> dQSvacrateHTQuali3_4 + dQSvacrateHTQuali3_5 + cluster_32 + cluster_33 +
>> cluster_34 ,family=binomial(link = "logit"))
>>
>> I found some differences in the parameters, could it come from the "robust"
>> option in the Stata command? It sounds strange that a variance option would
>> lead to changes in parameters estimation but I am not an econometrician.
>
> I noticed this same thing about a year ago when comparing STATA and R
> results (though, I was comparing simple linear models). It seems that,
> for whatever reason, STATA was reporting slight differences in the
> coefficients when applying robust. In R, on the other hand, one
> typically gets robust standard errors by applying, e.g., a sandwich
> estimator on the variance-covariance matrix of a model previously
> estimated. I am not sure what STATA is doing, and I haven't cared enough
> to check, but my understanding was also that the estimated coefficients
> should not have been affected by rubust (at least in the context of a
> strictly linear model).

I haven't seen differences in coefficient estimates with ,robust, and as you note, there shouldn't be (and this is true generally, not just for linear models).  The difference here is more likely to be that the Stata code estimates a probit model and the R code estimates a logit model.  For probit in R use  family=binomial("probit").

    -thomas


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list