[R] anti-R vitriol
John Maindonald
john.maindonald at anu.edu.au
Wed Jun 30 13:46:27 CEST 2004
I am curious. What were the dimensions of this data set? Did this
person know use read.table(), or scan(). Did they know about the
possibility of reading the data one part at a time?
The way that SAS processes the data row by row limits what can be done.
It is often possible with scant loss of information, and more
satisfactory, to work with a subset of the large data set or with
multiple subsets. Neither SAS (in my somewhat dated experience of it)
nor R is entirely satisfactory for this purpose. But at least in R,
given a subset that fits so easily into memory that the graphs are not
masses of black, there are few logistic problems in doing, rapidly and
interactively, a variety of manipulations and plots, with each new task
taking advantage of the learning that has gone before. To do that well
in the SAS world, it is necessary to use something like JMP or its
equivalent in one of the newer modules, which process data in a way
that is not all that different from R.
I have wondered about possibilities for a suite of functions that would
make it easy to process through R data that is stored in one large data
set, with a mix of adding a new variable or variables, repeating a
calculation on successive subsets of the data, producing predictions or
suchlike for separate subsets, etc. Database connections may be the way
to go (c.f., the Ripley and Fei Chen paper at ISI 2003), but it might
also be useful to have a simple set of functions that would handle some
standard requirements.
John Maindonald.
On 30 Jun 2004, at 8:02 PM, Barry Rowlingson
<B.Rowlingson at lancaster.ac.uk> wrote:
> A colleague is receiving some data from another person. That person
> reads the data in SAS and it takes 30s and uses 64k RAM. That person
> then tries to read the data in R and it takes 10 minutes and uses a
> gigabyte of RAM. Person then goes on to say:
>
> It's not that I think SAS is such great software,
> it's not. But I really hate badly designed
> software. R is designed by committee. Worse,
> it's designed by a committee of statisticians.
> They tend to confuse numerical analysis with
> computer science and don't have any idea about
> software development at all. The result is R.
>
> I do hope [your colleague] won't have to waste time doing
> [this analysis] in an outdated and poorly designed piece
> of software like R.
>
> Would any of the "committee" like to respond to this? Or shall we just
> slap our collective forehead and wonder how someone could get such a
> view?
>
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
More information about the R-help
mailing list