[R] Question about multiple regression

Gustaf Rydevik gustaf.rydevik at gmail.com
Tue Sep 9 12:51:52 CEST 2008


On Mon, Sep 8, 2008 at 7:47 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
> Thank you everyone for your responses. I'll answer several questions.
>
> 1. >  Disclaimer: I have **NO IDEA** of the details of what you want
> to do or why
>> -- but I am willing to bet that there are better ways of doing it than  1.8
>> mm multiple refressions that take 270 secs each!! (which I find difficult to
>> believe in itself -- are you sure you are doing things right? Something
>> sounds very fishy here: R's regression code is typically very fast).
> I probably should not bore everyone, but just to explain where the
> large number is coming from. I have an experimental design with 7
> factors. Each factor has between 3 and 5 levels. Once you cross them
> all, you end up with 18,000 cells. For each cell, I want to generate a
> sample of N=100. For each sample I have to analyze the data using 3
> different statistical methods of analysis (the goal of the
> Monte-Carlo) is to compare those methods. One of the methods requires
> running of up to ~32,000 simple multiple regressions - yes just for
> one sample and it's not a mistake. I test-ran one such analysis for a
> sample with N=800 and 15 predictors and it took 270 seconds. R was
> actually very fast - it ran each of the individual regressions in
> about 0.008 seconds. Still I need something faster.
>
> 2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For
> example, using it on my data, I got a value of 36,644...
>
> 3. I know that for similarly challenging situations people did used
> Fortran compilers. So, anyone heard of a free Fortran library or an
> efficient piece of code?
>
> Thank you!
> Dimitri
>


Have you considered the fact that 32000 regressions simply takes a lot of time?
I don't really have anything to go by, but it sounds unlikely that you
will be able to cut computing time by more than, say, ten times to 27
second. That would still leave you with 4 months of running a
computer.

Perhaps an alternative approach would be to get access to stronger
(super)computers, either at a university, or buying access. A quick
googling turns up http://www.clusterondemand.com/ for example.

Anyhow, good luck with your project! I'm sure the R list would be very
interested to hear of how you solved your problem.

Regards,

Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik



More information about the R-help mailing list