[R] Size limitations for model.matrix?

Matthew Keller mckellercran at gmail.com
Wed Apr 28 21:08:02 CEST 2010


Hi Gerald,

A matrix and an array *are* vectors that can be indexed by 2+ indices.
Thus, matrices and arrays are also limited to 2^31-1 elements.  You
might check out the bigmemory package, which can help with these
issues...

Matt



On Wed, Apr 28, 2010 at 11:01 AM,  <gerald.jean at dgag.ca> wrote:
>
> Hello,
>
> I am running:
>
> R version 2.10.0 (2009-10-26)
> Copyright (C) 2009 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> on a RedHat Linux box with 48Gb of memory.
>
> I am trying to create a model.matrix for a big model on a moderately large
> data set.  It seems there is a size limitation to this model.matrix.
>
>> dim(coll.train)
> [1] 677236    128
>> coll.1st.model.mat <- model.matrix(coll.1st.formula, data = coll.train)
>> dim(coll.1st.model.mat)
> [1] 581618    169
>
> One I saw the resulting model.matrix had fewer rows than the original
> data.frame I played with the number of input variables in the model:
>
>> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
> +
> +     license.category + minor.conviction + driver.training.certificate +
> +     admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
> faq43 +
> +     faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
> maison +
> +     nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
> rabperprg +
> +     rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc,
> +     data = coll.train)
> dim(ttt)
> [1] 677236    109
>
> ## OK so far, but if I had one more variable there will be missing rows.
>
>> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
> +
> +     license.category + minor.conviction + driver.training.certificate +
> +     admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
> faq43 +
> +     faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
> maison +
> +     nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
> rabperprg +
> +     rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc
> +
> +     prof.b2, data = coll.train)
> dim(ttt)
> [1] 676379    110
>
> Is there a limit to the size of a matrix and of a data.frame.  I know the
> limit for the length of a vector to be 2^31, but we are very far from that
> here.  Am I missing something?
>
> Thanks for any support,
>
> Gérald Jean
> Conseiller senior en statistiques,
> VP Actuariat et Solutions d'assurances,
> Desjardins Groupe d'Assurances Générales
> télephone            : (418) 835-4900 poste (7639)
> télecopieur          : (418) 835-6657
> courrier électronique: gerald.jean at dgag.ca
>
> "We believe in God, others must bring Data."
>
> W. Edwards Deming
>
>
> Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations
> privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire.
>
> This communication ( and/or the attachments ) is intended for named recipients only and may contain privileged or confidential information which is
> not to be disclosed. If you received this communication by mistake please destroy all copies.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com



More information about the R-help mailing list