[R] Multiple Multivariate regression in R with 50 independent variables
peter dalgaard
pdalgd at gmail.com
Fri Apr 19 19:51:04 CEST 2013
On Apr 19, 2013, at 19:15 , Nilesh Gupta wrote:
> cbind( X1,X2,X3,X4,X5,X6,X7, ... ,X2393,X2394,X2395 )~ R_M_F+SMB+HML+WML
>
> I ran this code in the formula for i wanted to regress 2395 stocks for 500 months each on the three independent variables.
> I got this error . The idea was to run multivriate regressions on each of these stocks.
>
> Error in model.matrix.default(mt, mf, contrasts) :
> model frame and formula mismatch in model.matrix()
>
>
> Googling this error led me to that page and I now know that i mistakenly assumed that lm was limited to 50 variables.
>
> Is doing cbind(variables name) was the way to formulate multivariate regressions.?
>
> Where am i going wrong ?
>
lm() is unhappy about long expressions (this is arguably a bug), so avoid them:
M <- cbind( X1,X2,X3,X4,X5,X6,X7, ... ,X2393,X2394,X2395 )
lm(M ~ R_M_F+SMB+HML+WML)
Notice, though, that multivariate tests will be unhappy if you have more variables than degrees of freedom (M wider than tall, essentially).
That's a theory issue, not an lm one.
>
> The woods are lovely, dark and deep
> But I have promises to keep
> And miles before I go to sleep
> And miles before I go to sleep
> -----
>
>
> On Fri, Apr 19, 2013 at 10:18 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Apr 19, 2013, at 4:51 AM, Nilesh Gupta wrote:
>
> > I used this link
> > http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-td4664093.html
>
> But you said 50 independent variables, and that was probably someone's (failed) effort to submit 50 _dependent_ variables. What is the real problem?
>
> --
> David.
>
>
> > Regards
> >
> > The woods are lovely, dark and deep
> > But I have promises to keep
> > And miles before I go to sleep
> > And miles before I go to sleep
> > -----
> >
> >
> > On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> >
> > On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
> >
> > > lm() does not accomodate more than 50 independent variables
> >
> > What is your source for this misinformation?
> >
> > > dat <- as.data.frame(matrix(rnorm(51000), ncol=51) )
> > > names(dat)
> > [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18"
> > [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36"
> > [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
> > > lm(V1 ~ ., dat=dat)
> >
> > Call:
> > lm(formula = V1 ~ ., data = dat)
> >
> > Coefficients:
> > (Intercept) V2 V3 V4 V5 V6 V7 V8
> > -0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482 0.0324383 -0.0194980 -0.0151008
> > V9 V10 V11 V12 V13 V14 V15 V16
> > 0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990 -0.0174327 -0.0104261 0.0024625
> > V17 V18 V19 V20 V21 V22 V23 V24
> > -0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201 -0.0027364 0.0090916 0.0198854
> > V25 V26 V27 V28 V29 V30 V31 V32
> > -0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765 0.0073514 0.0295976 -0.0641553
> > V33 V34 V35 V36 V37 V38 V39 V40
> > 0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851 -0.0373357 0.0506756 -0.0383495
> > V41 V42 V43 V44 V45 V46 V47 V48
> > 0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926 -0.0177631 0.0282828 0.0353523
> > V49 V50 V51
> > -0.0382634 0.0545654 0.0101398
> >
> > > dat <- as.data.frame(matrix(rnorm(101000), ncol=101) )
> > > lm(V1 ~ ., dat=dat)
> >
> > Call:
> > lm(formula = V1 ~ ., data = dat)
> >
> > Coefficients:
> > (Intercept) V2 V3 V4 V5 V6 V7 V8
> > 0.021065 -0.015988 -0.008273 0.049849 0.014874 0.012352 -0.054584 0.004542
> > V9 V10 V11 V12 V13 V14 V15 V16
> > -0.017186 0.018006 -0.009707 -0.007382 0.044886 -0.051122 -0.026910 -0.048929
> > V17 V18 V19 V20 V21 V22 V23 V24
> > -0.008129 0.022129 -0.063525 0.026683 0.013424 -0.010145 -0.046046 0.024025
> > V25 V26 V27 V28 V29 V30 V31 V32
> > -0.003529 -0.038270 0.043657 0.049855 0.010691 0.041217 -0.012596 0.018302
> > V33 V34 V35 V36 V37 V38 V39 V40
> > 0.040225 -0.012751 -0.062677 -0.002810 -0.002574 -0.024137 0.021324 -0.041520
> > V41 V42 V43 V44 V45 V46 V47 V48
> > -0.076482 0.009063 0.067097 -0.042554 -0.013789 0.002865 0.017325 -0.076860
> > V49 V50 V51 V52 V53 V54 V55 V56
> > -0.007003 -0.007315 0.030270 0.022066 -0.002224 -0.056534 0.013705 -0.003609
> > V57 V58 V59 V60 V61 V62 V63 V64
> > -0.044580 -0.037543 0.015745 0.035250 -0.017117 0.072470 0.004398 -0.015923
> > V65 V66 V67 V68 V69 V70 V71 V72
> > 0.012864 -0.062752 -0.038437 -0.019586 0.019871 -0.068398 -0.111778 0.021416
> > V73 V74 V75 V76 V77 V78 V79 V80
> > 0.036849 -0.009103 0.037790 0.021883 -0.034990 -0.014917 -0.003854 0.001760
> > V81 V82 V83 V84 V85 V86 V87 V88
> > -0.001812 0.003942 0.021810 -0.013984 -0.030446 0.049187 0.008392 0.026965
> > V89 V90 V91 V92 V93 V94 V95 V96
> > 0.057301 0.004190 0.055505 -0.046006 -0.019080 -0.098889 -0.010891 -0.002729
> > V97 V98 V99 V100 V101
> > 0.024939 -0.029847 0.063578 -0.061667 -0.022163
> >
> > > system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
> > user system elapsed
> > 0.060 0.008 0.076
> >
> > Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy?
> >
> > --
> > David.
> >
> >
> > >
> > > The woods are lovely, dark and deep
> > > But I have promises to keep
> > > And miles before I go to sleep
> > > And miles before I go to sleep
> > > -----
> > >
> > >
> > > On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com> wrote:
> > >
> > >>
> > >> On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
> > >>
> > >>> Hello all
> > >>>
> > >>> Is there a method/package in R in which I can do regressions for more
> > >> than
> > >>> 50 independent variables ?
> > >>
> > >> What's wrong with lm() et al.?
> > >>
> > >> --
> >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> David Winsemius
> Alameda, CA, USA
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list