[R] Error: cannot allocate vector of size...
maiya
maja.zaloznik at gmail.com
Tue Nov 10 22:22:08 CET 2009
Cool! Thanks for the sampling and ff tips! I think I've figured it out now
using sampling...
I'm getting a quad-core, 4GB RAM computer next week, will try it again using
a 64 bit version :)
Thanks for your time!!!
Maja
tlumley wrote:
>
> On Tue, 10 Nov 2009, maiya wrote:
>
>>
>> OK, it's the simple math that's confusing me :)
>>
>> So you're saying 2.4GB, while windows sees the data as 700KB. Why is that
>> different?
>
> Your data are stored on disk as a text file (in CSV format, in fact), not
> as numbers. This can take up less space.
>
>> And lets say I could potentially live with e.g. 1/3 of the cases - that
>> would make it .8GB, which should be fine? But then my question is if
>> there
>> is any way to sample the rows in read.table? Or what would be the best
>> way
>> of importing a random third of my cases?
>
> A better solution is probably to read a subset of the columns at a time.
> The easiest way to do this is probably to read the data into a SQLite
> database with the 'sqldf' package, but another solution is to use the
> colClasses= argument to read.table() and specify "NULL" for the classes of
> the columns you don't want to read. There are other ways as well.
>
> It might even be faster to do the cross-tabulations in a database and read
> the resulting summaries into R to compute any statistics you need.
>
>> Thanks!
>>
>> M.
>>
>>
>>
>> jholtman wrote:
>>>
>>> A little simple math. You have 3M rows with 100 items on each row.
>>> If read in this would be 300M items. If numeric, 8 bytes/item, this
>>> is 2.4GB. Given that you are probably using a 32 bit version of R,
>>> you are probably out of luck. A rule of thumb is that your largest
>>> object should consume at most 25% of your memory since you will
>>> probably be making copies as part of your processing.
>>>
>>> Given that, is you want to read in 100 variables at a time, I would
>>> say your limit would be about 500K rows to be reasonable. So you have
>>> a choice; read in fewer rolls, read in all 3M rows but at 20 columns
>>> per read, put the data in a database and extract what you need.
>>> Unless you go to a 64-bit version of R you will probably not be able
>>> to have the whole file in memory at one time.
>>>
>>> On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloznik at gmail.com> wrote:
>>>>
>>>> I'm trying to import a table into R the file is about 700MB. Here's my
>>>> first
>>>> try:
>>>>
>>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
>>>>
>>>> Error: cannot allocate vector of size 15.6 Mb
>>>> In addition: Warning messages:
>>>> 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>> :
>>>> Reached total allocation of 1535Mb: see help(memory.size)
>>>> 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>> :
>>>> Reached total allocation of 1535Mb: see help(memory.size)
>>>> 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>> :
>>>> Reached total allocation of 1535Mb: see help(memory.size)
>>>> 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>> :
>>>> Reached total allocation of 1535Mb: see help(memory.size)
>>>>
>>>> Then I tried
>>>>
>>>>> memory.limit(size=4095)
>>>> and got
>>>>
>>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
>>>> Error: cannot allocate vector of size 11.3 Mb
>>>>
>>>> but no additional errors. Then optimistically to clear up the
>>>> workspace:
>>>>
>>>>> rm()
>>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
>>>> Error: cannot allocate vector of size 15.6 Mb
>>>>
>>>> Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb,
>>>> 11.3Mb?
>>>> I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable
>>>> memory is usually 2Gb. Surely they mean GB?
>>>>
>>>> The file I'm importing has about 3 million cases with 100 variables
>>>> that
>>>> I
>>>> want to crosstabulate each with each. Is this completely unrealistic?
>>>>
>>>> Thanks!
>>>>
>>>> Maja
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26283467.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26291403.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list