[R] help for stata user

John Hendrickx john_hendrickx at yahoo.com
Mon Sep 27 11:02:29 CEST 2004


--- iip <iip at chead.org> wrote:

> Hi,
> 
> I'm new to R, and I'm STATA user before, could you help me where I
> can
> get document about comparison command between STATA and R.
> 
R is more of a statistical programming language whereas Stata is a
statistical package. R is more powerful but also has a steeper
learning curve. I like R's builtin matrix facilities, which are much
better than Stata's (although there are user-written extensions to
the Stata matrix facilities). R has good debugging tools and
user-written help files are integrated in the help.search() system. 

An area where Stata has the advantage is converting between strings
and variables. R has constructs like
"eval(parse(text=paste(object,"string),sep="")))", very confusing.
See 7.21 of the R faq for some examples.

Below are two sample jobs in R and in Stata. They read in a
space-delimited data file, define labels, summarize the data, run
tables, and estimate linear regression, multinomial logit, and
ordered logit models. The data are available from the "catspec"
package if you'd like to experiment with them yourself. Hope they get
you started in R. See "An introduction to R" to help you along a
little further (included in the R distribution). I also found John
Maindonald's "Using R for Data Analysis and Graphics" very helpful;
it's available at http://cran.r-project.org/other-docs.html

Good luck,
John Hendrickx

---- sampjob.do ----------------------------------------------------
/* Sample data used in Logan (1983: 332-333)
   Data are from the 1972-1978 merged Genderal Social Surveys (GSS)
   of the National Opinion Research Center (NORC). The selection is
   restricted to males, aged 25 to 34 at the time of the interview,
who
   were in the labour force at the time of interview, and who had
   non-missing values for respondent's education and occupation and
for
   father's occupation and education. N=838.

   Logan, J. (1983). "A multivariate model for mobility tables."
   American Journal of Sociology 89: 324-349.
*/
#delimit ;
label define occs 1 "Farm"          /* occupations */
                  2 "Operatives"    /* service, and laborers */
                  3 "Craftsmen"     /* and kindred workers */
                  4 "Sales"         /* and clerical" */
                  5 "Professional"; /* technical, and mangerial"*/
#delimit cr
label define race 1 "black" 0 "non-black"

infile byte(occ focc educ black) using logan.dat
label variable occ "occupation"
label variable focc "father's occupation at age 16"
label variable educ "education in years"
label variable black "race"
label values occ occs
label values focc occs
label values black race

summarize

tab focc
tab occ
tab focc occ
tab black
tab occ black
tab occ black, col
tab focc occ, row col

regress occ educ black
mlogit occ educ black, base(1)
ologit occ educ black
/* Sample data used in Logan (1983: 332-333)
   Data are from the 1972-1978 merged Genderal Social Surveys (GSS)
   of the National Opinion Research Center (NORC). The selection is
   restricted to males, aged 25 to 34 at the time of the interview,
who
   were in the labour force at the time of interview, and who had
   non-missing values for respondent's education and occupation and
for
   father's occupation and education. N=838.

   Logan, J. (1983). "A multivariate model for mobility tables."
   American Journal of Sociology 89: 324-349.
*/
#delimit ;
label define occs 1 "Farm"          /* occupations */
                  2 "Operatives"    /* service, and laborers */
                  3 "Craftsmen"     /* and kindred workers */
                  4 "Sales"         /* and clerical" */
                  5 "Professional"; /* technical, and mangerial"*/
#delimit cr
label define race 1 "black" 0 "non-black"

infile byte(occ focc educ black) using logan.dat
label variable occ "occupation"
label variable focc "father's occupation at age 16"
label variable educ "education in years"
label variable black "race"
label values occ occs
label values focc occs
label values black race

summarize

tab focc
tab occ
tab focc occ
tab black
tab occ black
tab occ black, col
tab focc occ, row col

regress occ educ black
mlogit occ educ black, base(1)
ologit occ educ black
/* Sample data used in Logan (1983: 332-333)
   Data are from the 1972-1978 merged Genderal Social Surveys (GSS)
   of the National Opinion Research Center (NORC). The selection is
   restricted to males, aged 25 to 34 at the time of the interview,
who
   were in the labour force at the time of interview, and who had
   non-missing values for respondent's education and occupation and
for
   father's occupation and education. N=838.

   Logan, J. (1983). "A multivariate model for mobility tables."
   American Journal of Sociology 89: 324-349.
*/
#delimit ;
label define occs 1 "Farm"          /* occupations */
                  2 "Operatives"    /* service, and laborers */
                  3 "Craftsmen"     /* and kindred workers */
                  4 "Sales"         /* and clerical" */
                  5 "Professional"; /* technical, and mangerial"*/
#delimit cr
label define race 1 "black" 0 "non-black"

infile byte(occ focc educ black) using logan.dat
label variable occ "occupation"
label variable focc "father's occupation at age 16"
label variable educ "education in years"
label variable black "race"
label values occ occs
label values focc occs
label values black race

summarize

tab focc
tab occ
tab focc occ
tab black
tab occ black
tab occ black, col
tab focc occ, row col

regress occ educ black
mlogit occ educ black, base(1)
ologit occ educ black
--------------------------------------------------------------------

--- sampjob.R -------------------------------------------------------
mytab <- function (x,y) {
	prop.table(table(x,y),2)*100
}

logan <- read.table("logan.dat")
names(logan) <- c("occ", "focc", "educ", "black")
attach(logan)
occ.codes <- c("farm", "operatives", "craftsmen", "sales",
"professional")
occ <- factor(occ,label=occ.codes)
focc <- factor(focc,label=occ.codes)
black <- factor(black,label=c("non-black", "black"))

summary(logan)

table(focc)
table(occ)
dit<-mytab(focc,occ)
dit
table(black)
mytab(occ,black)

library(gregmisc)
CrossTable(occ,black,prop.t=F,prop.r=F,fisher=FALSE)
CrossTable(focc,occ,prop.t=F,fisher=FALSE)
detach(package:gregmisc)

fm <- lm(occ~ educ+black, data=logan)
summary(fm)
anova(fm)

library(nnet)
mnl.logit<-multinom(occ ~ educ+black, data=logan)
summary(mnl.logit,correlation=FALSE)
detach(package:nnet)

library(MASS)
or.logit <- polr(occ ~ educ+black)
summary(or.logit)
detach(package:MASS)




More information about the R-help mailing list