[R] Can R replicate this data manipulation in SAS?
pdalgd at gmail.com
Fri Apr 22 00:34:55 CEST 2011
On Apr 21, 2011, at 16:00 , Bert Gunter wrote:
> It is perhaps worth noting that this is probably a Type III error: right
> answer to the wrong question. The right question would be: what data
> structures and analysis strategy are appropriate in R? As usual, different
> language architectures mean that different paradigms should be used to best
> fit a language's strengths and weaknesses. Direct translations do not
> necessarily do this.
Hum, there is a point, though: If you take the crude translation approach, you will soon realize that there is very little that SAS (or SPSS, or...) can do that you literally can't do in R.
It is often the case that there is much neater and well-structured approach in R, but the flip side is that there are cases where the neat solution is hard to find, and maybe some cases where it doesn't really exist (e.g. not everything can be vectorized). This is the sort of thing that in some circles give R a reputation for being poorly suited for data handling, compared to the DATA step in SAS. Do notice the circular logic that occurs when defining "typical statistical task" as "something you can do in SAS", though.
(One example is "last observation carried forward", a rather dubious technique for filling in missing observations in longitudinal studies, which probably directly stems from the RETAIN directive in SAS.
In R, you may find yourself doing something like
x[is.na(x)] <- x[!is.na(x)][cumsum(!is.na(x))[is.na(x)]]
which isn't even completely failsafe. However, you'll get the result soon enough with
for (i in seq_len(x)) if (is.na(x[i])) x[i] <- t else t <- x[i]
and this time, you can actually read the code.
Of course, approx() will do the trick much more swiftly than either of the above.)
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help