[R] Size limitations for model.matrix?
Matthew Keller
mckellercran at gmail.com
Wed Apr 28 21:08:02 CEST 2010
Hi Gerald,
A matrix and an array *are* vectors that can be indexed by 2+ indices.
Thus, matrices and arrays are also limited to 2^31-1 elements. You
might check out the bigmemory package, which can help with these
issues...
Matt
On Wed, Apr 28, 2010 at 11:01 AM, <gerald.jean at dgag.ca> wrote:
>
> Hello,
>
> I am running:
>
> R version 2.10.0 (2009-10-26)
> Copyright (C) 2009 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> on a RedHat Linux box with 48Gb of memory.
>
> I am trying to create a model.matrix for a big model on a moderately large
> data set. It seems there is a size limitation to this model.matrix.
>
>> dim(coll.train)
> [1] 677236 128
>> coll.1st.model.mat <- model.matrix(coll.1st.formula, data = coll.train)
>> dim(coll.1st.model.mat)
> [1] 581618 169
>
> One I saw the resulting model.matrix had fewer rows than the original
> data.frame I played with the number of input variables in the model:
>
>> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
> +
> + license.category + minor.conviction + driver.training.certificate +
> + admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
> faq43 +
> + faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
> maison +
> + nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
> rabperprg +
> + rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc,
> + data = coll.train)
> dim(ttt)
> [1] 677236 109
>
> ## OK so far, but if I had one more variable there will be missing rows.
>
>> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
> +
> + license.category + minor.conviction + driver.training.certificate +
> + admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
> faq43 +
> + faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
> maison +
> + nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
> rabperprg +
> + rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc
> +
> + prof.b2, data = coll.train)
> dim(ttt)
> [1] 676379 110
>
> Is there a limit to the size of a matrix and of a data.frame. I know the
> limit for the length of a vector to be 2^31, but we are very far from that
> here. Am I missing something?
>
> Thanks for any support,
>
> Gérald Jean
> Conseiller senior en statistiques,
> VP Actuariat et Solutions d'assurances,
> Desjardins Groupe d'Assurances Générales
> télephone : (418) 835-4900 poste (7639)
> télecopieur : (418) 835-6657
> courrier électronique: gerald.jean at dgag.ca
>
> "We believe in God, others must bring Data."
>
> W. Edwards Deming
>
>
> Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations
> privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire.
>
> This communication ( and/or the attachments ) is intended for named recipients only and may contain privileged or confidential information which is
> not to be disclosed. If you received this communication by mistake please destroy all copies.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com
More information about the R-help
mailing list