[Rd] memory issues with new release (PR#9344)

Pfaff, Bernhard Dr. Bernhard_Pfaff at fra.invesco.com
Tue Nov 7 09:56:31 CET 2006


>> spend more time on this. I really don't mind using the 
>previous version.


Hello Derek,

or upgrade to R 2.5.0dev; the execution of your code snippet is not
hampered by memory issues:

> sessionInfo()
R version 2.5.0 Under development (unstable) (2006-10-10 r39600) 
i386-pc-mingw32 

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=
German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "datasets"  "utils"

[7] "base"     

other attached packages:
fortunes 
 "1.3-2" 
> 

My output with respect to memory.limit(NA) is the same as yours.

Best,
Bernhard


>> Like you mentioned, probably just a function of the new 
>version requiring
>> more memory.
>
>
>Hmm, you might want to take a final look at the Windows FAQ 2.9. I am
>still not quite convinced you're really getting more than the default
>1.5 GB.
>
>Also, how much can you increase the problem size on 2.3.0 before it
>breaks? If you can only go to say 39 or 40 variables, then there's
>probably not much we can do. If it is orders of magnitude, then we may
>have a real bug (or not: sometimes we fix bugs resulting from things
>not being duplicated when they should have been, the fixed code then
>uses more memory than the unfixed code.)
>
>=20
>> Thanks,
>> Derek
>>=20
>>=20
>>=20
>> On 06 Nov 2006 21:42:04 +0100, Peter Dalgaard 
><p.dalgaard at biostat.ku.dk>
>> wrote:
>> >
>> > "Derek Stephen Elmerick" <delmeric at gmail.com> writes:
>> >
>> > > Thanks for the replies. Point taken regarding submission 
>protocol. I
>> > have
>> > > included a text file attachment that shows the R output 
>with version
>> > 2.3.0and
>> > > 2.4.0. A label distinguishing the version is included in 
>the comments.
>> > >
>> > > A quick background on the attached example. The dataset 
>has 650,000
>> > records
>> > > and 32 variables. the response is dichotomous (0/1) and 
>i ran a logis=
>tic
>> > > model (i previously mentioned multinomial, but decided 
>to start simple
>> > for
>> > > the example). Covariates in the model may be continuous 
>or categorica=
>l,
>> > but
>> > > all are numeric. You'll notice that the code is the same for both
>> > versions;
>> > > however, there is a memory error with the 2.3.0 version. 
>i ran this
>> > several
>> > > times and in different orders to make sure it was not 
>some sort of
>> > hardware
>> > > issue.
>> > >
>> > > If there is some sort of additional output that would be 
>helpful, I c=
>an
>> > > provide as well. Or, if there is nothing I can do, that 
>is fine also.
>> >
>> > I don't think it was ever possible to request 4GB on XP. 
>The version
>> > difference might be caused by different response to 
>invalid input in
>> > memory.limit(). What does memory.limit(NA) tell you after 
>the call to
>> > memory.limit(4095) in the two versions?
>> >
>> > If that is not the reason: What is the *real* restriction 
>of memory on
>> > your system? Do you actually have 4GB in your system (RAM+swap)?
>> >
>> > Your design matrix is on the order of 160 MB, so shouldn't be a
>> > problem with a GB-sized workspace. However, three copies of it will
>> > brush against 512 MB, and it's not unlikely to have that 
>many copies
>> > around.
>> >
>> >
>> >
>> > > -Derek
>> > >
>> > >
>> > > On 11/6/06, Kasper Daniel Hansen < 
>khansen at stat.berkeley.edu> wrote:
>> > > >
>> > > > It would be helpful to produce a script that 
>reproduces the error on
>> > > > your system. And include details on the size of your 
>data set and
>> > > > what you are doing with it. It is unclear what 
>function is actually
>> > > > causing the error and such. Really, in order to do 
>something about =
>it
>> > > > you need to show how to actually obtain the error.
>> > > >
>> > > > To my knowledge nothing _major_ has happened with the memory
>> > > > consumption, but of course R could use slightly more memory for
>> > > > specific purposes.
>> > > >
>> > > > But chances are that this is not really memory related but more
>> > > > related to the functions your are using - perhaps a 
>bug or perhaps a
>> > > > user error.
>> > > >
>> > > > Kasper
>> > > >
>> > > > On Nov 6, 2006, at 10:20 AM, Derek Stephen Elmerick wrote:
>> > > >
>> > > > > thanks for the friendly reply. i think my 
>description was fairly
>> > > > > clear: i
>> > > > > import a large dataset and run a model. using the 
>same dataset, t=
>he
>> > > > > process worked previously and it doesn't work now. if the new
>> > > > > version of R
>> > > > > requires more memory and this compromises some basic 
>data analyse=
>s,
>> > > > > i would
>> > > > > label this as a bug. if this memory issue was 
>mentioned in the
>> > > > > documentation, then i apologize. this email was 
>clearly not well
>> > > > > received,
>> > > > > so if there is a more appropriate place to post these sort of
>> > > > > questions,
>> > > > > that would be helpful.
>> > > > >
>> > > > > -derek
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On 06 Nov 2006 18:20:33 +0100, Peter Dalgaard
>> > > > > < p.dalgaard at biostat.ku.dk>
>> > > > > wrote:
>> > > > >>
>> > > > >> delmeric at gmail.com writes:
>> > > > >>
>> > > > >>> Full_Name: Derek Elmerick
>> > > > >>> Version: 2.4.0
>> > > > >>> OS: Windows XP
>> > > > >>> Submission from: (NULL) ( 38.117.162.243 )
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> hello -
>> > > > >>>
>> > > > >>> i have some code that i run regularly using R 
>version 2.3.x . t=
>he
>> > > > >>> final
>> > > > >> step of
>> > > > >>> the code is to build a multinomial logit model. 
>the dataset is
>> > > > >>> large;
>> > > > >> however, i
>> > > > >>> have not had issues in the past. i just installed the 2.4.0
>> > > > >>> version of R
>> > > > >> and now
>> > > > >>> have memory allocation issues. to verify, i ran 
>the code again
>> > > > >>> against
>> > > > >> the 2.3
>> > > > >>> version and no problems. since i have set the 
>memory limit to t=
>he
>> > > > >>> max
>> > > > >> size, i
>> > > > >>> have no alternative but to downgrade to the 2.3 
>version. though=
>ts?
>> > > > >>
>> > > > >> And what do you expect the maintainers to do about 
>it? ( I.e. why
>> > are
>> > > > >> you filing a bug report.)
>> > > > >>
>> > > > >> You give absolutely no handle on what the cause of 
>the problem
>> > might
>> > > > >> be, or even to reproduce it. It may be a bug, or 
>maybe just R
>> > > > >> requiring more memory to run than previously.
>> > > > >>
>> > > > >> --
>> > > > >>   O__  ---- Peter Dalgaard             =C3=98ster 
>Farimagsgade 5=
>, Entr.B
>> > > > >> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 
>1014 Cph. K
>> > > > >> (*) \(*) -- University of Copenhagen   Denmark      
>    Ph:  (+4=
>5)
>> > > > >> 35327918
>> > > > >> ~~~~~~~~~~ - ( p.dalgaard at biostat.ku.dk)            
>      FAX:
>> > (+45)
>> > > > >> 35327907
>> > > > >>
>> > > > >
>> > > > >       [[alternative HTML version deleted]]
>> > > > >
>> > > > > ______________________________________________
>> > > > > R-devel at r-project.org mailing list
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> > > > ######
>> > > > ### R 2.4.0
>> > > > ######
>> > > >
>> > > > rm(list=3Dls(all=3DTRUE))
>> > > > memory.limit(size=3D4095)
>> > > NULL
>> > > >
>> > > > clnt=3Dread.table
>> > (file=3D"K:\\all_data_reduced_vars.dat",header=3DT,sep=3D"\t")
>> > > >
>> > > > chk.rsp=3Dglm(formula =3D resp_chkonly ~ x1 + x2 + x3 + x4 +
>> > > +     x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
>> > > +     x14 + x15 + x16 + x17 + x18 + x19 +x20 +
>> > > +     x21 + x22 +x23 + x24 + x25 + x26 +x27 +
>> > > +     x28 + x29 + x30 + x27*x29 + x28*x30, family =3D binomial,
>> > > +     data =3D clnt)
>> > > Error: cannot allocate vector of size 167578 Kb
>> > > >
>> > > > dim(clnt)
>> > > [1] 650000     32
>> > > > sum(clnt)
>> > > [1] 112671553493
>> > > >
>> > >
>> > > ##################################################
>> > > ##################################################
>> > >
>> > > > ######
>> > > > ### R 2.3.0
>> > > > ######
>> > > >
>> > > > rm(list=3Dls(all=3DTRUE))
>> > > > memory.limit(size=3D4095)
>> > > NULL
>> > > >
>> > > > clnt=3Dread.table
>> > (file=3D"K:\\all_data_reduced_vars.dat",header=3DT,sep=3D"\t")
>> > > >
>> > > > chk.rsp=3Dglm(formula =3D resp_chkonly ~ x1 + x2 + x3 + x4 +
>> > > +     x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 +
>> > > +     x14 + x15 + x16 + x17 + x18 + x19 +x20 +
>> > > +     x21 + x22 +x23 + x24 + x25 + x26 +x27 +
>> > > +     x28 + x29 + x30 + x27*x29 + x28*x30, family =3D binomial,
>> > > +     data =3D clnt)
>> > > >
>> > > > dim(clnt)
>> > > [1] 650000     32
>> > > > sum(clnt)
>> > > [1] 112671553493
>> > > >
>> >
>> > --
>> >   O__  ---- Peter Dalgaard             =C3=98ster 
>Farimagsgade 5, Entr.B
>> > c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>> > (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45)
>> > 35327918
>> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45)
>> > 35327907
>> >
>>=20
>> >=20
>> > ######
>> > ### R 2.4.0
>> > ######
>> >=20
>> > rm(list=3Dls(all=3DTRUE))
>> > memory.limit(size=3D4095)
>> NULL
>> > memory.limit(NA)
>> [1] 4293918720
>> >=20
>> > set.seed(314159)
>> > clnt=3Dmatrix(runif(650000*38),650000,38)
>> > y=3Dround(runif(650000,0,1))
>> > clnt=3Ddata.frame(y,clnt)
>> > 
>attributes(clnt)$names=3Dc("y","x1","x2","x3","x4","x5","x6","x
>7","x8",=
>"x9","x10","x11","x12","x13","x14",
>> +                         
>"x15","x16","x17","x18","x19","x20","x21","x22"=
>,"x23","x24","x25","x26","x27",
>> +                         
>"x28","x29","x30","x31","x32","x33","x34","x35"=
>,"x36","x37")
>> > dim(clnt)
>> [1] 650000     39
>> > sum(clnt)
>> [1] 12674827
>> >=20
>> >=20
>> > chk.rsp=3Dglm(formula =3D y ~ x1 + x2 + x3 + x4 + x5 + x6 
>+ x7 + x8 + x=
>9 + x10 + x11 + x12 + x13 + x14 +=20
>> +                           x15 + x16 + x17 + x18 + x19 + 
>x20 + x21 + x22=
> + x23 + x24 + x25 + x26 + x27 +=20
>> +                           x28 + x29 + x30 + x31 + x32 + 
>x33 + x34 + x35=
> + x36 + x37, family =3D binomial, data =3D clnt)
>> Error: cannot allocate vector of size 192968 Kb
>> >=20
>> >=20
>>=20
>> ##############################################################
>> ##############################################################
>> ##############################################################
>>=20
>> >=20
>> > ######
>> > ### R 2.3.0
>> > ######
>> >=20
>> > rm(list=3Dls(all=3DTRUE))
>> > memory.limit(size=3D4095)
>> NULL
>> > memory.limit(NA)
>> [1] 4293918720
>> >=20
>> > set.seed(314159)
>> > clnt=3Dmatrix(runif(650000*38),650000,38)
>> > y=3Dround(runif(650000,0,1))
>> > clnt=3Ddata.frame(y,clnt)
>> > 
>attributes(clnt)$names=3Dc("y","x1","x2","x3","x4","x5","x6","x
>7","x8",=
>"x9","x10","x11","x12","x13","x14",
>> +                         
>"x15","x16","x17","x18","x19","x20","x21","x22"=
>,"x23","x24","x25","x26","x27",
>> +                         
>"x28","x29","x30","x31","x32","x33","x34","x35"=
>,"x36","x37")
>> > dim(clnt)
>> [1] 650000     39
>> > sum(clnt)
>> [1] 12674827
>> >=20
>> >=20
>> > chk.rsp=3Dglm(formula =3D y ~ x1 + x2 + x3 + x4 + x5 + x6 
>+ x7 + x8 + x=
>9 + x10 + x11 + x12 + x13 + x14 +=20
>> +                           x15 + x16 + x17 + x18 + x19 + 
>x20 + x21 + x22=
> + x23 + x24 + x25 + x26 + x27 +=20
>> +                           x28 + x29 + x30 + x31 + x32 + 
>x33 + x34 + x35=
> + x36 + x37, family =3D binomial, data =3D clnt)
>> >=20
>> >=20
>> >=20
>
>--=20
>   O__  ---- Peter Dalgaard             =C3=98ster 
>Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark          Ph:  
>(+45) 35327918
>~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: 
>(+45) 35327907
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
>
*****************************************************************
Confidentiality Note: The information contained in this mess...{{dropped}}



More information about the R-devel mailing list