[R] Reasons to Use R

Alan Zaslavsky zaslavsk at hcp.med.harvard.edu
Wed Apr 11 17:06:50 CEST 2007


Right: SAS objects (at least in the base and statistics components of the 
system -- there are dozens of add-ons for particular markets) are simple 
databases.  the predominant model for data manipulation and statistical 
calculation is a row by row operation that creates modified rows and/or 
accumulates totals.  This was pretty much the only way things could be 
done in the days when real (and typically virtual) memory was much smaller 
than it now is.  It can be a pretty efficient model for calculatons that 
fit that pattern.  One downside of course is that a line of R code can 
easily turn into 30 lines of SAS with data steps, sort steps, steps to 
accumulate totals, etc.

As noted by a couple of previous writers, S-Plus might be regarded as 
somewhat intermediate in its model in that objects constitute files but 
rows do not correspond to chunks of adjacent bytes in memory or filespace.

I have thought for a long time that a facility for efficient rowwise 
calculations might be a valuable enhancement to S/R.  The storage of the 
object would be handled by a database and there would have to be an 
efficient interface for pulling a row (or small chunk of rows) out of the 
database repeatedly; alternatively the operatons could be conducted inside
the database.  Basic operations of rowwise calculation and cumulation
(such as forming a column sum or a sum of outer-products) would be
written in an R-like syntax and translated into an efficient set of
operations that work through the database.  (Would be happy to share
some jejeune notes on this.)  However the main answer to thie problem
in the R world seems to have been Moore's Law.  Perhaps somebody could
tell us more about the S-Plus large objects library, or the work that
Doug Bates is doing on efficient calculations with large datasets.

 	Alan Zaslavsky
 	zaslavsk at hcp.med.harvard.edu

> Date: Tue, 10 Apr 2007 16:27:50 -0600
> From: "Greg Snow" <Greg.Snow at intermountainmail.org>
> Subject: Re: [R] Reasons to Use R
> To: "Wensui Liu" <liuwensui at gmail.com>
>
> I think SAS has the database part built into it.  I have heard 2nd hand
> of new statisticians going to work for a company and asking if they have
> SAS, the reply is "Yes we use SAS for our database, does it do
> statistics also?"  Also I heard something about SAS is no longer
> considered an acronym, they like having it be just a name and don't want
> the fact that one of the S's used to stand for statistics to scare away
> companies that use it as a database.
>
> Maybe someone more up on SAS can confirm or deny this.



More information about the R-help mailing list