[R] R runtime performance and memory usage

Martin Maechler maechler at stat.math.ethz.ch
Tue Nov 17 18:49:41 CET 2015


>>>>> William Dunlap <wdunlap at tibco.com>
>>>>>     on Mon, 16 Nov 2015 16:01:42 -0800 writes:

    > If a quick running time is important and your models involve only
    > numeric data with no missing values and you are willing to spend more
    > programming time setting things up, the lsfit() function may work
    > better for you.

    > Bill Dunlap
    > TIBCO Software
    > wdunlap tibco.com

or even faster is the extra-simple but fast  .lm.fit() function
(in R >= 3.1.0).

I've written a small demo about it and published it here,
   http://rpubs.com/maechler/fast_lm

Martin Maechler, ETH Zurich (and R Core)


    > On Mon, Nov 16, 2015 at 3:25 PM, Sasikumar Kandhasamy <ckmsasi at gmail.com> wrote:
    >> Thanks a lot Bill & Bert.
    >> 
    >> Hi Bill,
    >> 
    >> Sorry i was wrong on number of records, actually, i am using two dimensional
    >> data of 250K records each. And regarding CPU usage, it was the elapsed time.
    >> Infact, i have pined one core to run R.
    >> 
    >> Thanks & Regards
    >> Sasi
    >> 
    >> On Mon, Nov 16, 2015 at 2:04 PM, William Dunlap <wdunlap at tibco.com> wrote:
    >>> 
    >>> You cannot do a linear regression with one column of data - there must
    >>> be at least one response column and one predictor.  By default, lm
    >>> throws in a constant term which gives you a second predictor.  If your
    >>> predictor is categorical, you get a new column for all but the first
    >>> unique value in it.
    >>> 
    >>> lm() deals only with double precision data, at 8 bytes/number.  Thus
    >>> 250k numbers occupies 2 million bytes.  Your three columns (in the
    >>> non-categorical-predictor case)  take up 6 million bytes,
    >>> 
    >>> lm()'s output contains several columns the size of the response
    >>> variable: residuals, effects, and fitted.values.  It also contains the
    >>> QR decomposition of the design matrix (the size of all the predictor
    >>> columns together).
    >>> 
    >>> There are also some temporary variables generated in the course of the
    >>> computation.
    >>> 
    >>> So your observed 40 MB memory usage seems reasonable.
    >>> 
    >>> Use the object.size() function to see how big objects are and str() to
    >>> look at their structure.
    >>> 
    >>> My laptop with  a 2.5 GHz Intel i7 processor takes a quarter second to
    >>> fit a simple linear model with one numeric predictor and a constant
    >>> term.  6 seconds sounds slow.  Is that cpu or elapsed time (use
    >>> system.time() to see)?
    >>> 
    >>> 
    >>> 
    >>> Bill Dunlap
    >>> TIBCO Software
    >>> wdunlap tibco.com
    >>> 
    >>> 
    >>> On Mon, Nov 16, 2015 at 12:25 PM, Sasikumar Kandhasamy
    >>> <ckmsasi at gmail.com> wrote:
    >>> > Hi All,
    >>> >
    >>> > I have couple of clarifications on R run-time performance. I have
    >>> > R-3.2.2
    >>> > package compiled for MIPS64 and am running it on my linux machine with
    >>> > mips64 processor (core speed 1.5GHz) and observing the following
    >>> > behaviors,
    >>> >
    >>> > 1. Applying "linear regression model" (lm) on 1MB of data (contains 1
    >>> > column of 250K records) takes ~6 seconds to complete. Anyidea, is it an
    >>> > expected behavior or not? If not, can you please the suggestions or
    >>> > options
    >>> > to improve if we have any?
    >>> >
    >>> > 2. Also, the R process runtime virtual memory is increased by 40MB after
    >>> > applying the linear model on 1MB data. Is it also expected behavior? If
    >>> > it
    >>> > is expected, can you please share the insight of memory usage?
    >>> >
    >>> > Thanks in advance.
    >>> >
    >>> > Regards
    >>> > Sasi
    >>> >
    >>> >         [[alternative HTML version deleted]]
    >>> >
    >>> > ______________________________________________
    >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >>> > https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > PLEASE do read the posting guide
    >>> > http://www.R-project.org/posting-guide.html
    >>> > and provide commented, minimal, self-contained, reproducible code.
    >> 
    >> 

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list