[R] " cannot allocate vector of length 1072693248"

Sat May 15 20:52:02 CEST 2004

Andy;

Well, that about does it....

I'm copying this one back to the list for the benefit of those who may hit
this thread while searching the archives.  Your changes to the code run just
fine on the my Windows machine, but gives the vector length error on the G4
whether I'm using the OS X build of R (as in Raqua) or the X11 build (for
Darwin). It is worth noting that I have nearly twice as much RAM and HD on
the OS X G4 as I have on the Pentium.

So it's definitely a platform specific problem. What on earth does one do
about that?

It's not emergent, but whomever works on the R source would probably like to
know.

On 5/12/04 20:29, "Liaw, Andy" <andy_liaw at merck.com> wrote:

> That's because I _attach()_ the .rda file you sent me, so that copy of
> `USdata' is in search position 2.  The subset statement makes a copy
> containing the subset in the workspace (aka global environment).  At the end
> of the loop, that copy is rm()'ed, but the copy in search position 2 is
> still accessible.
> 
> One thing I could have done is:
> 
> data.list <- split(USdata, USdata$symbol)
> 
> then inside the loop, just use data.list[[i]].
> 
> Andy
> 
>> -----Original Message-----
>> From: David L. Van Brunt, Ph.D. [mailto:dvanbrunt at well-wired.com]
>> Sent: Wednesday, May 12, 2004 9:14 PM
>> To: Liaw, Andy
>> Subject: Re: [R] " cannot allocate vector of length 1072693248"
>> 
>> 
>> I  took it for a spin.
>> 
>> Odd, but looking at your code, it doesn¹t look like it should run, and
>> indeed it doesn¹t, past the second loop. Early in the loop
>> you overwrite
>> ³USData² as follows:
>> 
>>        USdata <- USdata[USdata$symbol == tickernames[i], -54]
>> 
>> Then at the end of the loop your remove it with:
>> 
>>           rm(USdata) ## ,risk.rf,risk.pred,risk.rsq,
>> 
>> So on my second pass, I get the following error:
>> 
>>> finished 1 of 30
>>> Just loaded:  2 of 30 .  AIG  Assigning vectors and outcomes....
>>> Error: Object "USdata" not found
>> 
>> 
>> I see you "attached" the dataset prior to the loop, but this
>> seems to be
>> circumvented in that you call "USdata$<somevar" in each case
>> within the
>> loop.
>> 
>> I've had it happen that after noodling around, I'm working
>> only by virtue of
>> having leftovers from prior work, but with a fresh launch I
>> discover that it
>> won't work after all.
>> 
>> Did I miss something?
>> 
>> Anyway, I'll try to rework my code to more closely
>> approximate yours (or
>> rather, vise-versa) and let you know how that goes.
>> 
>> Sorry so many delays on my end. Hell at work, just wiped out
>> by the time I
>> get home. Saw your post on the balancing of data, too, by the
>> way... Very
>> interesting, and very helpful.
>> 
>> On 5/10/04 21:05, "Liaw, Andy" <andy_liaw at merck.com> wrote:
>> 
>>> David,
>>> 
>>> I changed the code a bit so that it runs one ticker symbol
>> worth of data in
>>> each iteration.  Is that what you were doing?  You can see
>> from the attached
>>> file that I still don't get any error.  Memory usage was
>> less, and the code
>>> ran a lot faster (of course).
>>> 
>>> Andy
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: David L. Van  Brunt, Ph.D. [mailto:dvanbrunt at well-wired.com]
>>>> Sent: Monday, May  10, 2004 12:05 AM
>>>> To: Liaw, Andy
>>>> Subject: Re: [R] " cannot  allocate vector of length 1072693248"
>>>> 
>>>> Thanks. I did just the opposite  today, ran through 500
>> loops using only the
>>>> regression forests, to be sure  there was no issue there.
>>>> 
>>>> Worked fine. But the classifications crash  even when run alone.
>>>> 
>>>> I'm not sure what you mean in your first sentence.  The
>> data set I posted is
>>>> the full data, which I would normally query out of  within
>> the loop to pull
>>>> each ticker symbol out one at a time.
>>>> 
>>>> Maybe  that's the secret sauce.... I'm doing a MySQL query
>> inside that loop,
>>>> whereas  you loaded the data from a file. Each time I do
>> the run, I'm working
>>>> with a  different subset of the data, i.e., a different
>> result set of the
>>>> MySQL query.  I think, unless I missed something, that you
>> are repeating the
>>>> same analysis  on the same data 20 times.
>>>> 
>>>> Running the code as you sent it back to me,  I had similar
>> results to what
>>>> you did. But it wasn't the same analysis. Each  result
>> set- or group of
>>>> results for each iteration of the loop-- should be on  a
>> new subset of data.
>>>> Said differently, the first time through the loop, all
>> the cases should have
>>>> a value of "AA" for "symbol", next time through they
>> should all be "AIG",
>>>> etc. The file had 30 loops worth (the Dow), but it
>> usually dies around 6 or
>>>> 7. Don't know why just repeating them seems to work,  though...
>>>> 
>>>> I thought at one time to get all the data from outside the
>>  loop, then just
>>>> subset differently (with the "testset" and "predset"
>> definition) each time
>>>> through... That's the way it was originally, and when  the
>> problem first
>>>> showed up. I only moved the query inside the loop because
>> I  thought it would
>>>> spare me the overhead of partially duplicating the data in  memory.
>>>> 
>>>> Man, this is a head-scratcher.
>>>> 
>>>> 
>>>> On 5/9/04 21:05,  "Liaw, Andy" <andy_liaw at merck.com> wrote:
>>>> 
>>>> 
>>>>> David,
>>>>> 
>>>>> I assume the data you  posted is iteration worth in your
>> for loop?  I looped
>>>>> over it 20 times  and didn't get any errors (did have to
>> change the code a
>>>>> bit to make it  run).  Please look over the attached file
>> to see if what I
>>>>> tested is  close to what you would expect.  I ran it on
>> an Opteron 248 with
>>>>> 8GB of  RAM.  From `top', the maximum memory usage for
>> the R process is
>>>>> 366MB.   It took just over an hour to run the 20 reps, so
>> it was not using
>>>>> anywhere close to 1GB of RAM as your error message would
>> indicate.
>>>>> 
>>>>> I would really appreciate it if  you can strip the code
>> down as much as
>>>>> possible to only the part that  produce the error.  E.g.,
>> if none of the
>>>>> regression runs were causing  problems, comment them out
>> and see if you
>>>>> still get the error.  Saves  running time and eye-balling time.
>>>>> 
>>>>> Best,
>>>>> Andy
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -----Original  Message-----
>>>>>> From: David L. Van  Brunt, Ph.D.
>> [mailto:dvanbrunt at well-wired.com]
>>>>>> Sent: Friday, May  07, 2004 11:21 PM
>>>>>> To: Liaw,  Andy
>>>>>> Subject: Re: [R] " cannot  allocate vector of length  1072693248"
>>>>>> 
>>>>>> Good news/Bad  news....
>>>>>> 
>>>>>> 4.2-1  installed without a hitch from source on OS X.
>> But the same
>>>>>> behavior  occurred, and in the same place. In the
>> syntax code I sent, I
>>>>>> had commented  out the prediction call after
>> "checkpoint 1", but that only
>>>>>> stays the  execution... It dies at 5 of  30 if I leave
>> those lines in, but
>>>>>> dies anyway at  12 of 30 on "gain2"  if  I take 'em out.
>>>>>> 
>>>>>> HTH. 
>>>>>> 
>>>>>> On 5/7/04  21:09, "Liaw,  Andy" <andy_liaw at merck.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> David,  Please see reply inline below.    Andy
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -----Original   Message-----
>>>>>>>> From: David L. Van  Brunt, Ph.D.
>> [mailto:dvanbrunt at well-wired.com]
>>>>>>>> Sent: Friday, May  07, 2004 8:01  PM
>>>>>>>> To: Liaw,  Andy
>>>>>>>> Subject: Re: [R] " cannot   allocate vector of length
>> 1072693248"
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks very much for   taking a crack at  this. It's
>> been a ton of fun,
>>>>>>>> but this bump in the  road sure   has me stumped.
>>>>>>>> 
>>>>>>>> I realized after emailing you   earlier that I left
>> out  important
>>>>>>>> information. Here   goes...
>>>>>>>> 
>>>>>>>> -Using R version 1.90 beta for OS  X, but  also did it
>>  on the latest
>>>>>>>> Windows version (same behavior), and  on  version 8.1
>> for Windows.
>>>>>>>> (same  behavior)
>>>>>>>> -randomForest 4.0-7
>>>>>>>> -No   problems at all on  regression.... Only happens with
>>>>>>>> classification!
>>>>>>>> [AL] Could you try version 4.2-1  at
>>>>>>>> 
>> http://home.comcast.net/~andyliaw/randomForest_4.2-1.tar.gz
>>  (source)
>>>>>>>> or 
>> http://home.comcast.net/~andyliaw/randomForest_4.2-1.zip    (Windows
>>>>>>>> binary) and see if that makes any    difference?
>>>>>>>> 
>>>>>>>> Here's the code .. If I remove   one code block, it
>> will give  the same
>>>>>>>> error on another  code block,  always failing with the
>>  memory overflow
>>>>>>>> right after "checkpoint  1"
>>>>>>>> 
>>>>>>>> Be   gentle,
>>>>>>>> [AL] Don't worry, I won't    bite...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I'm new at   this! This is all a learning  experience
>> for me, and I
>>>>>>>> thought some  readily available data would make for a
>> good  exercise.
>>>>>>>> This specific  challenge is to loop through the Dow  data, make
>>>>>>>> predictions for each  member, save out the  results to
>> a table. Yes, it
>>>>>>>> is  silly, but it's  a great way  to learn your way
>> around a new program!
>>>>>>>> 
>>>>>>>> The  data   file is quite large, which is why I use
>> MySQL and only pull
>>>>>>>> in a  little   bit at a time. That's what I initially
>> thought was  wrong
>>>>>>>> (too much  data in  memory, as I read in the whole
>> thing) and why I put
>>>>>>>> the  select query inside  the loop to  only pull out
>> one member at a
>>>>>>>> time.   I'm attaching the  data,  so you can get the
>> structure. I'm sure
>>>>>>>> there are a  lot of ways I could write  this better,
>> but it does work
>>>>>>>> for the first few times through. Here's the   code...
>>>>>>>> [AL] If you sent  the data as zip file,  it would have
>>  been stripped off
>>>>>>>> silently by  our email  servers.  Could you post it
>> on a web site
>>>>>>>> somewhere   that I can download, or use the bzip2
>> format  instead?   It's
>>>>>>>> hard for me to diagnose without   data.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>> --------------------------------------------------------------
>> ------------
>>>>>>> ----
>>>>>>> Notice:    This e-mail message, together with any
>> attachments, contains
>>>>>>> information of Merck  Co., Inc. (One Merck Drive,
>> Whitehouse  Station,
>>>>>>> New Jersey, USA 08889), and/or its affiliates (which
>> may  be known outside
>>>>>>> the United States as Merck Frosst, Merck Sharp   Dohme
>> or MSD and in
>>>>>>> Japan, as Banyu) that may be  confidential, proprietary
>> copyrighted and/or
>>>>>>> legally privileged.  It is intended solely for the use
>> of the individual
>>>>>>> or  entity  named on this message.  If you are not the
>> intended recipient,
>>>>>>> and have received this message in error, please notify
>> us  immediately by
>>>>>>> reply e-mail and then delete it from your   system.
>>>>>>> 
>> --------------------------------------------------------------
>> ------------
>>>>>>> ----
>>>>>>> 
>>>>>> 
>>>> 
>> 
>> 
>> -- 
>> David L. Van Brunt, Ph.D.
>> Outlier Consulting & Development
>> mailto: <ocd at well-wired.com>
>> 
>> 
>> 
>> 
> 
> 
> ------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may be known outside the
> United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted and/or legally
> privileged. It is intended solely for the use of the individual or entity
> named on this message.  If you are not the intended recipient, and have
> received this message in error, please notify us immediately by reply e-mail
> and then delete it from your system.
> ------------------------------------------------------------------------------

-- 
David L. Van Brunt, Ph.D.
Outlier Consulting & Development
mailto: <ocd at well-wired.com>