[R] Question about multiple regression

Dimitri Liakhovitski ld7631 at gmail.com
Mon Sep 8 19:47:21 CEST 2008


Thank you everyone for your responses. I'll answer several questions.

1. >  Disclaimer: I have **NO IDEA** of the details of what you want
to do or why
> -- but I am willing to bet that there are better ways of doing it than  1.8
> mm multiple refressions that take 270 secs each!! (which I find difficult to
> believe in itself -- are you sure you are doing things right? Something
> sounds very fishy here: R's regression code is typically very fast).
I probably should not bore everyone, but just to explain where the
large number is coming from. I have an experimental design with 7
factors. Each factor has between 3 and 5 levels. Once you cross them
all, you end up with 18,000 cells. For each cell, I want to generate a
sample of N=100. For each sample I have to analyze the data using 3
different statistical methods of analysis (the goal of the
Monte-Carlo) is to compare those methods. One of the methods requires
running of up to ~32,000 simple multiple regressions - yes just for
one sample and it's not a mistake. I test-ran one such analysis for a
sample with N=800 and 15 predictors and it took 270 seconds. R was
actually very fast - it ran each of the individual regressions in
about 0.008 seconds. Still I need something faster.

2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For
example, using it on my data, I got a value of 36,644...

3. I know that for similarly challenging situations people did used
Fortran compilers. So, anyone heard of a free Fortran library or an
efficient piece of code?

Thank you!
Dimitri


>
> -- Bert Gunter
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Dimitri Liakhovitski
> Sent: Monday, September 08, 2008 9:56 AM
> To: Prof Brian Ripley
> Cc: R-Help List
> Subject: Re: [R] Question about multiple regression
>
> Yes, see my previous e-mail on how long R takes (270 seconds for one
> of the 1,800,000 sets I need) - using system.time.
> Not sure how to test the same for Fortran...
>
> On Mon, Sep 8, 2008 at 12:51 PM, Prof Brian Ripley
> <ripley at stats.ox.ac.uk> wrote:
>> Are you sure R's ways are not fast enough (there are many layers
> underneath
>> lm)?  For an example of how you might do this at C/Fortran level, see the
>> function lqs() in MASS.
>>
>> On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote:
>>
>>> Dear R-list,
>>> maybe some of you could point me in the right direction:
>>>
>>> Are you aware of any FREE Fortran or Java libraries/actual pieces of
>>> code that are VERY efficient (time-wise) in running the regular linear
>>> least-squares multiple regression?
>>
>> A lot of the effort is in getting the right answer fast, including for
> e.g.
>> collinear inputs.
>>
>>> More specifically, I have to run small regression models (between 1
>>> and 15 predictors) on samples of up to N=700 but thousands and
>>> thousands of them.
>>>
>>> I am designing a simulation in R and running those regressions and R
>>> itself is way too slow. So, I am thinking of compiling the regression
>>> run itself in Fortran and Java and then calling it from R.
>>
>> I think Java is unlikely to be fast compared to the Fortran R itself uses.
>>
>> Have you profiled to find where the time is really being spent (both R and
>> C/Fortran profiling if necessary).
>>
>>>
>>> Thank you very much for any advice!
>>>
>>> Dimitri Liakhovitski
>>> MarketTools, Inc.
>>> Dimitri.Liakhovitski at markettools.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>
>
>
> --
> Dimitri Liakhovitski
> MarketTools, Inc.
> Dimitri.Liakhovitski at markettools.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com



More information about the R-help mailing list