[R] Question about multiple regression
Dimitri Liakhovitski
ld7631 at gmail.com
Mon Sep 8 19:47:21 CEST 2008
Thank you everyone for your responses. I'll answer several questions.
1. > Disclaimer: I have **NO IDEA** of the details of what you want
to do or why
> -- but I am willing to bet that there are better ways of doing it than 1.8
> mm multiple refressions that take 270 secs each!! (which I find difficult to
> believe in itself -- are you sure you are doing things right? Something
> sounds very fishy here: R's regression code is typically very fast).
I probably should not bore everyone, but just to explain where the
large number is coming from. I have an experimental design with 7
factors. Each factor has between 3 and 5 levels. Once you cross them
all, you end up with 18,000 cells. For each cell, I want to generate a
sample of N=100. For each sample I have to analyze the data using 3
different statistical methods of analysis (the goal of the
Monte-Carlo) is to compare those methods. One of the methods requires
running of up to ~32,000 simple multiple regressions - yes just for
one sample and it's not a mistake. I test-ran one such analysis for a
sample with N=800 and 15 predictors and it took 270 seconds. R was
actually very fast - it ran each of the individual regressions in
about 0.008 seconds. Still I need something faster.
2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For
example, using it on my data, I got a value of 36,644...
3. I know that for similarly challenging situations people did used
Fortran compilers. So, anyone heard of a free Fortran library or an
efficient piece of code?
Thank you!
Dimitri
