[R] naive question

james.holtman@convergys.com james.holtman at convergys.com
Wed Jun 30 22:38:24 CEST 2004

It is amazing the amount of time that has been spent on this issue.  In
most cases, if you do some timing studies using 'scan', you will find that
you can read some quite large data structures in a reasonable time.  If you
initial concern was having to wait 10 minutes to have your data read in,
you could have read in quite a few data sets by now.

When comparing speeds/feeds of processors, you also have to consider what
it being done on them.  Back in the "dark ages" we had a 1 MIP computer
with 4M of memory handling input from 200 users on a transaction system.
Today I need a 1GHZ computer with 512M to just handle me.  Now true, I am
doing a lot different processing on it.

With respect to I/O, you have to consider what is being read in and how it
is converted.  Each system/program has different requirements.  I have some
applications (running on a laptop) that can read in approximately 100K rows
of data per second (of course they are already binary).  On the other hand,
I can easily slow that down to 1K rows per second if I do not specify the
correct parameters to 'read.table'.

So go back and take a look at what you are doing, and instrument your code
to see where time is being spent.  The nice thing about R is that there are
a number of ways of approaching a solution and it you don't like the timing
of one way, try another.  That is half the fun of using R.
James Holtman        "What is the problem you are trying to solve?"
Executive Technical Consultant  --  Office of Technology, Convergys
james.holtman at convergys.com
+1 (513) 723-2929

                      <rivin at euclid.math.te                                                                        
                      mple.edu>                    To:       <p.dalgaard at biostat.ku.dk>                            
                      Sent by:                     cc:       r-help at stat.math.ethz.ch,                             
                      r-help-bounces at stat.m         tplate at blackmesacapital.com, rivin at euclid.math.temple.edu      
                      ath.ethz.ch                  Subject:  Re: [R] naive question                                
                      06/30/2004 16:25                                                                             

> <rivin at euclid.math.temple.edu> writes:
>> I did not use R ten years ago, but "reasonable" RAM amounts have
>> multiplied by roughly a factor of 10 (from 128Mb to 1Gb), CPU speeds
>> have gone up by a factor of 30 (from 90Mhz to 3Ghz), and disk space
>> availabilty has gone up probably by a factor of 10. So, unless the I/O
>> performance scales nonlinearly with size (a bit strange but not
>> inconsistent with my R experiments), I would think that things should
>> have gotten faster (by the wall clock, not slower). Of course, it is
>> possible that the other components of the R system have been worked on
>> more -- I am not equipped to comment...
> I think your RAM calculation is a bit off. in late 1993, 4MB systems
> were the standard PC, with 16 or 32 MB on high-end workstations.

I beg to differ. In 1989, Mac II came standard with 8MB, NeXT came
standard with 16MB. By 1994, 16MB was pretty much standard on good quality
(= Pentium, of which the 90Mhz was the first example) PCs, with 32Mb
pretty common (though I suspect that most R/S-Plus users were on SUNs,
which were somewhat more plushly equipped).

> Comparable figures today are probably  256MB for the entry-level PC and
> a couple GB in the high end. So that's more like a factor of 64. On the
> other hand, CPU's have changed by more than the clock speed; in
> particular, the number of clock cycles per FP calculation has
> decreased considerably and is currently less than one in some
> circumstances.
I think that FP performance has increased more than integer performance,
which has pretty much kept pace with the clock speed. The compilers have
also improved a bit...


R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide!

More information about the R-help mailing list