[R] Question about multiple regression

Dimitri Liakhovitski ld7631 at gmail.com
Mon Sep 8 20:27:56 CEST 2008


I could get an r squared from lm.fit by correlating fitted.values and
my response variable.
But could I do it somehow using Sums of Squares? I am clear on SS for
residuals. But where is SS for the model or the total SS in lm.fit
output?
Thank you!
Dimitri

On Mon, Sep 8, 2008 at 1:57 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Mon, Sep 8, 2008 at 1:47 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
>> Thank you everyone for your responses. I'll answer several questions.
>>
>> 1. >  Disclaimer: I have **NO IDEA** of the details of what you want
>> to do or why
>>> -- but I am willing to bet that there are better ways of doing it than  1.8
>>> mm multiple refressions that take 270 secs each!! (which I find difficult to
>>> believe in itself -- are you sure you are doing things right? Something
>>> sounds very fishy here: R's regression code is typically very fast).
>> I probably should not bore everyone, but just to explain where the
>> large number is coming from. I have an experimental design with 7
>> factors. Each factor has between 3 and 5 levels. Once you cross them
>> all, you end up with 18,000 cells. For each cell, I want to generate a
>> sample of N=100. For each sample I have to analyze the data using 3
>> different statistical methods of analysis (the goal of the
>> Monte-Carlo) is to compare those methods. One of the methods requires
>> running of up to ~32,000 simple multiple regressions - yes just for
>> one sample and it's not a mistake. I test-ran one such analysis for a
>> sample with N=800 and 15 predictors and it took 270 seconds. R was
>> actually very fast - it ran each of the individual regressions in
>> about 0.008 seconds. Still I need something faster.
>>
>> 2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For
>> example, using it on my data, I got a value of 36,644...
>
> Its the sum of the squares of the residuals.
>
>>
>> 3. I know that for similarly challenging situations people did used
>> Fortran compilers. So, anyone heard of a free Fortran library or an
>> efficient piece of code?
>>
>> Thank you!
>> Dimitri
>>
>>
>>>
>>> -- Bert Gunter
>>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>>> Behalf Of Dimitri Liakhovitski
>>> Sent: Monday, September 08, 2008 9:56 AM
>>> To: Prof Brian Ripley
>>> Cc: R-Help List
>>> Subject: Re: [R] Question about multiple regression
>>>
>>> Yes, see my previous e-mail on how long R takes (270 seconds for one
>>> of the 1,800,000 sets I need) - using system.time.
>>> Not sure how to test the same for Fortran...
>>>
>>> On Mon, Sep 8, 2008 at 12:51 PM, Prof Brian Ripley
>>> <ripley at stats.ox.ac.uk> wrote:
>>>> Are you sure R's ways are not fast enough (there are many layers
>>> underneath
>>>> lm)?  For an example of how you might do this at C/Fortran level, see the
>>>> function lqs() in MASS.
>>>>
>>>> On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote:
>>>>
>>>>> Dear R-list,
>>>>> maybe some of you could point me in the right direction:
>>>>>
>>>>> Are you aware of any FREE Fortran or Java libraries/actual pieces of
>>>>> code that are VERY efficient (time-wise) in running the regular linear
>>>>> least-squares multiple regression?
>>>>
>>>> A lot of the effort is in getting the right answer fast, including for
>>> e.g.
>>>> collinear inputs.
>>>>
>>>>> More specifically, I have to run small regression models (between 1
>>>>> and 15 predictors) on samples of up to N=700 but thousands and
>>>>> thousands of them.
>>>>>
>>>>> I am designing a simulation in R and running those regressions and R
>>>>> itself is way too slow. So, I am thinking of compiling the regression
>>>>> run itself in Fortran and Java and then calling it from R.
>>>>
>>>> I think Java is unlikely to be fast compared to the Fortran R itself uses.
>>>>
>>>> Have you profiled to find where the time is really being spent (both R and
>>>> C/Fortran profiling if necessary).
>>>>
>>>>>
>>>>> Thank you very much for any advice!
>>>>>
>>>>> Dimitri Liakhovitski
>>>>> MarketTools, Inc.
>>>>> Dimitri.Liakhovitski at markettools.com
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> --
>>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>>
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> MarketTools, Inc.
>>> Dimitri.Liakhovitski at markettools.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> MarketTools, Inc.
>> Dimitri.Liakhovitski at markettools.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com



More information about the R-help mailing list