[R] things that are difficult/impossible to do in SAS or SPSSbut simple in R

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Thu Jan 17 19:05:58 CET 2008

Wittner, Ben, Ph.D. wrote:
> Several people have mentioned large, messy data sets.
> I am curious as to in what way messy data sets are messy.
> (I am also curious about what SAS does that helps one deal with them, but
> perhaps that's asking too much.)
One aspect is that in the "SAS culture" (e.g. pharma industry), data are
only allowed to get messy in ways that people know how to handle with
SAS. Other data are "not statistical data sets"...

Typically, people like the flexibility of the DATA step in SAS; this
allows things like having input data where the records have different
formats depending on a code in column 1-3.

Once data have been converted to rectangular data sets, there is very
little you can do more conveniently in SAS than in R, the main exception
could be things that truly require sequential processing beyond cumsum
and cumprod. You can of course do that sort of thing in R with an
explicit loop over data frame rows, but it does get slow.

On the other hand, SAS is not well suited for massively irregular data,
e.g. with images inside. Not that this is an area where R shines
particularly brightly, but at least it is possible to get a handle on
things there.

   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

More information about the R-help mailing list