[R] Help with R

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu May 5 12:34:43 CEST 2005

"Angus Repper" <arepper at hotmail.com> writes:

> Hello
> I am a long-time SAS user, but am new to R. I specifically am looking for
> information pertaining to generating graphics for web output. I would like
> to create dynamic graphics (in the form of generalized reports)  for my web
> site that is written with php and mysql. Is 'R' capable of doing
> this? 

Yes, people have done that. I'm not the one to ask for the details,
but it comes up on the mailing lists from time to time (hint: we have
archives...). I gather that the hardest part is to get the bitmapped
graphics to look right.

> I
> heard that 'R' does not do a very good job at handling large datasets, is
> this true? 

Yes, with qualifications: R stores entire data sets in memory, which
is a disadvantage for procedures that can be implemented using
sequential file access. However, these days PCs routinely ship with
more RAM than we had on our harddisks 5 years ago. The benefit of R is
that it allows nonsequential or multipass procedures to be specified
simply: R's x - mean(x) in SAS would be PROC MEANS followed by a DATA
step (there are various other options, I'm sure, but none involving a
single DATA step). 

For some statistical procedures, SAS also needs to store data in
memory, which makes the comparison more of a toss-up. R has generally
a bit of a cavalier attitude towards conserving memory, so often runs
into memory limitations more quickly, but carefully coded routines
like the lmer function can handle considerably larger data sets than
PROC MIXED  via the use of sparse-matrix techniques.

Both systems are victims of the curse of the rectangular data set to
some extent. Prototypically, you record the sex of a rat along with
every single measurement on it, as if the rat could change sex at
millisecond resolution. This probably applies to all current
statistical systems, but there is some hope that R's more flexible
data structures can be leveraged to better handle multilevel data.
(Cue Probabilistic Relational Models a.m. Getoor et al., which Peter
Green brought up at the recent gR meeting.)

   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

More information about the R-help mailing list