Sun Sep 7 16:13:50 CEST 2008

I have a regression where the lm() goes through fine. This mailing
list has always encouraged me to worry about how a robust regression
might do things differently. I tried two approaches and both don't

First I need to give you the dataset:

> load(url("http://www.mayin.org/ajayshah/tmp/long.rda"))

This gives you a data frame named "long". Here's the simple lm():

> summary(lm(da.g1 ~ -1 + f.year +
+            major.industry +
+            i.x +
+            lta.l1 + I(lta.l1^2), data=long))

lm(formula = da.g1 ~ -1 + f.year + major.industry + i.x + lta.l1 + 
    I(lta.l1^2), data = long)

     Min       1Q   Median       3Q      Max 
-632.563  -15.405   -5.090    7.797  543.972 

                          Estimate Std. Error t value Pr(>|t|)    
f.year2002                15.94994    4.52330   3.526 0.000424 ***
f.year2003                15.89005    4.50107   3.530 0.000418 ***
f.year2004                19.38506    4.48749   4.320 1.58e-05 ***
f.year2005                23.65796    4.49146   5.267 1.43e-07 ***
f.year2006                32.07334    4.48707   7.148 9.72e-13 ***
f.year2007                35.88498    4.51369   7.950 2.16e-15 ***
major.industryDiversified  1.74538    3.19979   0.545 0.585452    
major.industryElectricity  3.61036    6.16091   0.586 0.557887    
major.industryFood         1.52626    1.70112   0.897 0.369637    
major.industryMachinery   -0.15078    1.40149  -0.108 0.914329    
major.industryMetals       5.94554    1.66175   3.578 0.000349 ***
major.industryMiscManuf    1.76956    2.17527   0.813 0.415965    
major.industryNonMetalMin  1.49889    1.92084   0.780 0.435224    
major.industryServ.IT      8.62764    1.86841   4.618 3.95e-06 ***
major.industryServ.Other   6.43315    1.70598   3.771 0.000164 ***
major.industryTextiles     0.07868    1.56312   0.050 0.959859    
major.industryTransportEq  4.81549    1.76354   2.731 0.006338 ** 
i.xTRUE                    4.15376    0.97944   4.241 2.26e-05 ***
lta.l1                    -3.91434    1.58494  -2.470 0.013546 *  
I(lta.l1^2)                0.23105    0.13922   1.660 0.097045 .  
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 34.48 on 6829 degrees of freedom
  (1649 observations deleted due to missingness)
Multiple R-squared: 0.2266,  Adjusted R-squared: 0.2244 
F-statistic: 100.1 on 20 and 6829 DF,  p-value: < 2.2e-16 

In this, f.year is a factor and major.industry is a factor. i.x is a
boolean. lta.l1 is a real number. The left hand side variable (da.g1)
is a real number. I put a -1 on the regression to make space for the
dummy variables.

On to my woes with robust regressions. MASS:rlm() breaks:

> library(MASS)
> summary(rlm(da.g1 ~ -1 + f.year +
+            major.industry +
+            i.x +
+            lta.l1 + I(lta.l1^2), method="MM", data=long))
Error in rlm.default(x, y, weights, method = method, wt.method = wt.method,  : 
  'x' is singular: singular fits are not implemented in rlm

robustbase::lmrob() breaks:

> library(robustbase)
> summary(lmrob(da.g1 ~ -1 + f.year +
+            major.industry +
+            i.x +
+            lta.l1 + I(lta.l1^2), data=long))

Too many singular resamples
Aborting fast_s_w_mem()

Error in lmrob.S(x = x, y = y, control = control) : 
  C function R_lmrob_S() exited prematurely

If you could guide me on what I'm doing wrong, that'll be great. How
would I do the above specification as a robust regression? I googled
around and I found a few others asking these same questions in the
past, but it didn't look like there was a clear answer.

Ajay Shah                                      http://www.mayin.org/ajayshah  
ajayshah using mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.

