[R] Size limitations for model.matrix?

gerald.jean at dgag.ca gerald.jean at dgag.ca
Wed Apr 28 19:01:48 CEST 2010


Hello,

I am running:

R version 2.10.0 (2009-10-26)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

on a RedHat Linux box with 48Gb of memory.

I am trying to create a model.matrix for a big model on a moderately large
data set.  It seems there is a size limitation to this model.matrix.

> dim(coll.train)
[1] 677236    128
> coll.1st.model.mat <- model.matrix(coll.1st.formula, data = coll.train)
> dim(coll.1st.model.mat)
[1] 581618    169

One I saw the resulting model.matrix had fewer rows than the original
data.frame I played with the number of input variables in the model:

> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
+
+     license.category + minor.conviction + driver.training.certificate +
+     admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
faq43 +
+     faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
maison +
+     nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
rabperprg +
+     rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc,
+     data = coll.train)
dim(ttt)
[1] 677236    109

## OK so far, but if I had one more variable there will be missing rows.

> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
+
+     license.category + minor.conviction + driver.training.certificate +
+     admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
faq43 +
+     faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
maison +
+     nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
rabperprg +
+     rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc
+
+     prof.b2, data = coll.train)
dim(ttt)
[1] 676379    110

Is there a limit to the size of a matrix and of a data.frame.  I know the
limit for the length of a vector to be 2^31, but we are very far from that
here.  Am I missing something?

Thanks for any support,

Gérald Jean
Conseiller senior en statistiques,
VP Actuariat et Solutions d'assurances,
Desjardins Groupe d'Assurances Générales
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean at dgag.ca

"We believe in God, others must bring Data."

W. Edwards Deming


Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire.

This communication ( and/or the attachments ) is intended for named recipients only and may contain privileged or confidential information which is
not to be disclosed. If you received this communication by mistake please destroy all copies.


More information about the R-help mailing list