[R] Statistical Software Comparison
tlumley at u.washington.edu
Wed Nov 22 18:00:58 CET 2006
On Tue, 21 Nov 2006, Kenneth Cabrera wrote:
> Hi R users:
> I want to know if any of you had used
> Stata or Statgraphics.
We use Stata for teaching courses aimed at graduate students in other
departments, and also (as a consequence) on a lot of medical/public health
research projects. It is easier to learn than R, and has good support for
all the methods we teach in the service courses [unlike, eg, SPSS or
Minitab]. Part of the reason it is easier to learn is that there is a
very regular syntax. [There is also a GUI, now, but it isn't a very good
one and we were using Stata for teaching before it had a GUI].
> What are the advantages and disadvantages with
> respect to R on the following aspects?
> 1. Statistical functions or options for advanced
> experimental design (fractional, mixed models,
> greco-latin squares, split-plot, etc).
Stata is not very good at this sort of thing. Neither is R, yet, since
lme() is really for longitudinal data and lmer() is still developing.
> 2. Bayesian approach to experimental design.
Not much here, either, in Stata
> 3. Experimental design planing options.
> 4. Manuals (theory included in the manuals).
Stata is excellent. They usually give formulas as well as references (and
sometimes algorithms and computational notes that are not in the
references). The only problem is they keep growing and dividing, so the
cost of a complete set goes up quite rapidly with each release (and the
volume that you want is always on the other side of the room or lent out
The online help is also good. It suffers relative to R from the examples
not necessarily being directly executable.
> 5. Support (in this aspect there is no comparison with R,
> the R list is the best known support).
The Stata list is pretty good, too. You can see it at
> 6. Numerical stability.
For most purposes this is not really an issue and I haven't pushed Stata
to the edge. I haven't seen any problems.
Stata does have a smaller range of built-in optimizers, and they seem to
have stopped at the Marquadt algorithm. This has only once been a problem
for me (in fitting log-binomial generalized linear models), but could be a
problem in implementing new methods.
> 7. Implementation of modern statistical approaches.
It depends on the area. It's not bad at all in biostatistics and in some
areas of econometrics. As with R there is also a lot of user-written code,
some of it of excellent quality.
The Stata language is better than it looks, but some things can be easily
programmed in it and some can't. The last two versions of Stata have
introduced language changes in order to be able to implement better
graphics and linear mixed models, and you can also now call C code from
Stata, so things are improving.
Algorithms that are suited to a `one rectangular dataset' view of the
world are often very fast in Stata, but the penalty for not vectorizing is
even stiffer than in R.
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help