[R] Multiple Multivariate regression in R with 50 independent variables
William Dunlap
wdunlap at tibco.com
Fri Apr 19 21:30:03 CEST 2013
To avoid the formula handling bug in lm/model.matrix/etc., you can try making the
formula shorter. E.g., if you know the names of your response columns,
responseCols <- c("X1", "X2", "X3", ..., "X2395")
try the formula
as.matrix(d[, responseCols]) ~ d[,"R_M_F"] + d[,"SMB"] + d[,"HML"] + d[,"WML"]
and do not use data=d in the call to lm().
You may also prefer to use lm.fit(), which takes the response matrix and design matrix
directly, so you avoid formulae altogether.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Nilesh Gupta
> Sent: Friday, April 19, 2013 10:16 AM
> To: David Winsemius
> Cc: r-help at r-project.org; peter dalgaard
> Subject: Re: [R] Multiple Multivariate regression in R with 50 independent variables
>
> cbind( X1,X2,X3,X4,X5,X6,X7, ... ,X2393,X2394,X2395 )~ R_M_F+SMB+HML+WML
>
> I ran this code in the formula for i wanted to regress 2395 stocks for 500
> months each on the three independent variables.
> I got this error . The idea was to run multivriate regressions on each of
> these stocks.
>
> Error in model.matrix.default(mt, mf, contrasts) :
> model frame and formula mismatch in model.matrix()
>
>
> Googling this error led me to that page and I now know that i mistakenly
> assumed that lm was limited to 50 variables.
>
> Is doing cbind(variables name) was the way to formulate multivariate
> regressions.?
>
> Where am i going wrong ?
>
>
> The woods are lovely, dark and deep
> But I have promises to keep
> And miles before I go to sleep
> And miles before I go to sleep
> -----
>
>
> On Fri, Apr 19, 2013 at 10:18 PM, David Winsemius <dwinsemius at comcast.net>wrote:
>
> >
> > On Apr 19, 2013, at 4:51 AM, Nilesh Gupta wrote:
> >
> > > I used this link
> > >
> > http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-
> td4664093.html
> >
> > But you said 50 independent variables, and that was probably someone's
> > (failed) effort to submit 50 _dependent_ variables. What is the real
> > problem?
> >
> > --
> > David.
> >
> >
> > > Regards
> > >
> > > The woods are lovely, dark and deep
> > > But I have promises to keep
> > > And miles before I go to sleep
> > > And miles before I go to sleep
> > > -----
> > >
> > >
> > > On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net>
> > wrote:
> > >
> > > On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
> > >
> > > > lm() does not accomodate more than 50 independent variables
> > >
> > > What is your source for this misinformation?
> > >
> > > > dat <- as.data.frame(matrix(rnorm(51000), ncol=51) )
> > > > names(dat)
> > > [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
> > "V12" "V13" "V14" "V15" "V16" "V17" "V18"
> > > [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29"
> > "V30" "V31" "V32" "V33" "V34" "V35" "V36"
> > > [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47"
> > "V48" "V49" "V50" "V51"
> > > > lm(V1 ~ ., dat=dat)
> > >
> > > Call:
> > > lm(formula = V1 ~ ., data = dat)
> > >
> > > Coefficients:
> > > (Intercept) V2 V3 V4 V5
> > V6 V7 V8
> > > -0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482
> > 0.0324383 -0.0194980 -0.0151008
> > > V9 V10 V11 V12 V13
> > V14 V15 V16
> > > 0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990
> > -0.0174327 -0.0104261 0.0024625
> > > V17 V18 V19 V20 V21
> > V22 V23 V24
> > > -0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201
> > -0.0027364 0.0090916 0.0198854
> > > V25 V26 V27 V28 V29
> > V30 V31 V32
> > > -0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765
> > 0.0073514 0.0295976 -0.0641553
> > > V33 V34 V35 V36 V37
> > V38 V39 V40
> > > 0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851
> > -0.0373357 0.0506756 -0.0383495
> > > V41 V42 V43 V44 V45
> > V46 V47 V48
> > > 0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926
> > -0.0177631 0.0282828 0.0353523
> > > V49 V50 V51
> > > -0.0382634 0.0545654 0.0101398
> > >
> > > > dat <- as.data.frame(matrix(rnorm(101000), ncol=101) )
> > > > lm(V1 ~ ., dat=dat)
> > >
> > > Call:
> > > lm(formula = V1 ~ ., data = dat)
> > >
> > > Coefficients:
> > > (Intercept) V2 V3 V4 V5
> > V6 V7 V8
> > > 0.021065 -0.015988 -0.008273 0.049849 0.014874
> > 0.012352 -0.054584 0.004542
> > > V9 V10 V11 V12 V13
> > V14 V15 V16
> > > -0.017186 0.018006 -0.009707 -0.007382 0.044886
> > -0.051122 -0.026910 -0.048929
> > > V17 V18 V19 V20 V21
> > V22 V23 V24
> > > -0.008129 0.022129 -0.063525 0.026683 0.013424
> > -0.010145 -0.046046 0.024025
> > > V25 V26 V27 V28 V29
> > V30 V31 V32
> > > -0.003529 -0.038270 0.043657 0.049855 0.010691
> > 0.041217 -0.012596 0.018302
> > > V33 V34 V35 V36 V37
> > V38 V39 V40
> > > 0.040225 -0.012751 -0.062677 -0.002810 -0.002574
> > -0.024137 0.021324 -0.041520
> > > V41 V42 V43 V44 V45
> > V46 V47 V48
> > > -0.076482 0.009063 0.067097 -0.042554 -0.013789
> > 0.002865 0.017325 -0.076860
> > > V49 V50 V51 V52 V53
> > V54 V55 V56
> > > -0.007003 -0.007315 0.030270 0.022066 -0.002224
> > -0.056534 0.013705 -0.003609
> > > V57 V58 V59 V60 V61
> > V62 V63 V64
> > > -0.044580 -0.037543 0.015745 0.035250 -0.017117
> > 0.072470 0.004398 -0.015923
> > > V65 V66 V67 V68 V69
> > V70 V71 V72
> > > 0.012864 -0.062752 -0.038437 -0.019586 0.019871
> > -0.068398 -0.111778 0.021416
> > > V73 V74 V75 V76 V77
> > V78 V79 V80
> > > 0.036849 -0.009103 0.037790 0.021883 -0.034990
> > -0.014917 -0.003854 0.001760
> > > V81 V82 V83 V84 V85
> > V86 V87 V88
> > > -0.001812 0.003942 0.021810 -0.013984 -0.030446
> > 0.049187 0.008392 0.026965
> > > V89 V90 V91 V92 V93
> > V94 V95 V96
> > > 0.057301 0.004190 0.055505 -0.046006 -0.019080
> > -0.098889 -0.010891 -0.002729
> > > V97 V98 V99 V100 V101
> > > 0.024939 -0.029847 0.063578 -0.061667 -0.022163
> > >
> > > > system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
> > > user system elapsed
> > > 0.060 0.008 0.076
> > >
> > > Sorry to give you such a Frost-y reception, but you are being somewhat
> > ... what's the right word... sleepy?
> > >
> > > --
> > > David.
> > >
> > >
> > > >
> > > > The woods are lovely, dark and deep
> > > > But I have promises to keep
> > > > And miles before I go to sleep
> > > > And miles before I go to sleep
> > > > -----
> > > >
> > > >
> > > > On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com>
> > wrote:
> > > >
> > > >>
> > > >> On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
> > > >>
> > > >>> Hello all
> > > >>>
> > > >>> Is there a method/package in R in which I can do regressions for more
> > > >> than
> > > >>> 50 independent variables ?
> > > >>
> > > >> What's wrong with lm() et al.?
> > > >>
> > > >> --
> > >
> > >
> > > David Winsemius
> > > Alameda, CA, USA
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list