[R] Question about multiple regression

Gabor Grothendieck ggrothendieck at gmail.com
Mon Sep 8 19:57:37 CEST 2008


On Mon, Sep 8, 2008 at 1:47 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
> Thank you everyone for your responses. I'll answer several questions.
>
> 1. >  Disclaimer: I have **NO IDEA** of the details of what you want
> to do or why
>> -- but I am willing to bet that there are better ways of doing it than  1.8
>> mm multiple refressions that take 270 secs each!! (which I find difficult to
>> believe in itself -- are you sure you are doing things right? Something
>> sounds very fishy here: R's regression code is typically very fast).
> I probably should not bore everyone, but just to explain where the
> large number is coming from. I have an experimental design with 7
> factors. Each factor has between 3 and 5 levels. Once you cross them
> all, you end up with 18,000 cells. For each cell, I want to generate a
> sample of N=100. For each sample I have to analyze the data using 3
> different statistical methods of analysis (the goal of the
> Monte-Carlo) is to compare those methods. One of the methods requires
> running of up to ~32,000 simple multiple regressions - yes just for
> one sample and it's not a mistake. I test-ran one such analysis for a
> sample with N=800 and 15 predictors and it took 270 seconds. R was
> actually very fast - it ran each of the individual regressions in
> about 0.008 seconds. Still I need something faster.
>
> 2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For
> example, using it on my data, I got a value of 36,644...

Its the sum of the squares of the residuals.

>
> 3. I know that for similarly challenging situations people did used
> Fortran compilers. So, anyone heard of a free Fortran library or an
> efficient piece of code?
>
> Thank you!
> Dimitri
>
>
>>
>> -- Bert Gunter
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Dimitri Liakhovitski
>> Sent: Monday, September 08, 2008 9:56 AM
>> To: Prof Brian Ripley
>> Cc: R-Help List
>> Subject: Re: [R] Question about multiple regression
>>
>> Yes, see my previous e-mail on how long R takes (270 seconds for one
>> of the 1,800,000 sets I need) - using system.time.
>> Not sure how to test the same for Fortran...
>>
>> On Mon, Sep 8, 2008 at 12:51 PM, Prof Brian Ripley
>> <ripley at stats.ox.ac.uk> wrote:
>>> Are you sure R's ways are not fast enough (there are many layers
>> underneath
>>> lm)?  For an example of how you might do this at C/Fortran level, see the
>>> function lqs() in MASS.
>>>
>>> On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote:
>>>
>>>> Dear R-list,
>>>> maybe some of you could point me in the right direction:
>>>>
>>>> Are you aware of any FREE Fortran or Java libraries/actual pieces of
>>>> code that are VERY efficient (time-wise) in running the regular linear
>>>> least-squares multiple regression?
>>>
>>> A lot of the effort is in getting the right answer fast, including for
>> e.g.
>>> collinear inputs.
>>>
>>>> More specifically, I have to run small regression models (between 1
>>>> and 15 predictors) on samples of up to N=700 but thousands and
>>>> thousands of them.
>>>>
>>>> I am designing a simulation in R and running those regressions and R
>>>> itself is way too slow. So, I am thinking of compiling the regression
>>>> run itself in Fortran and Java and then calling it from R.
>>>
>>> I think Java is unlikely to be fast compared to the Fortran R itself uses.
>>>
>>> Have you profiled to find where the time is really being spent (both R and
>>> C/Fortran profiling if necessary).
>>>
>>>>
>>>> Thank you very much for any advice!
>>>>
>>>> Dimitri Liakhovitski
>>>> MarketTools, Inc.
>>>> Dimitri.Liakhovitski at markettools.com
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> --
>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> MarketTools, Inc.
>> Dimitri.Liakhovitski at markettools.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Dimitri Liakhovitski
> MarketTools, Inc.
> Dimitri.Liakhovitski at markettools.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list