[Rd] Severe memory problem using split()
cstrato
cstrato at aon.at
Tue Jul 13 21:05:27 CEST 2010
Dear Martin,
Thank you for this explanation.
Best regards
Christian
On 7/13/10 12:31 AM, Martin Morgan wrote:
> On 07/12/2010 03:00 PM, cstrato wrote:
>> Dear Martin,
>>
>> Thank you, you are right, now I get:
>>
>>> ann<- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)
>>> object.size(ann)
>> 2035952 bytes
>>> u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>>> object.size(u2p)
>> 1207368 bytes
>>> object.size(unlist(u2p))
>> 865176 bytes
>>
>> Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of
>
> but it's a list of length(unique(ann[["UNIT_ID"]]))) elements, each of
> which has a pointer to the element, a pointer to the name of the
> element, and the element data itself. I'd guess it adds up in a
> non-mysterious way. For a sense of it (and maybe only understandable if
> you have a working understanding of how R represents data) see, e.g.,
>
>> .Internal(inspect(list(x=1,y=2)))
> @1a4c538 19 VECSXP g0c2 [ATT] (len=2, tl=0)
> @191cad8 14 REALSXP g0c1 [] (len=1, tl=0) 1
> @191caa8 14 REALSXP g0c1 [] (len=1, tl=0) 2
> ATTRIB:
> @16fc8d8 02 LISTSXP g0c0 []
> TAG: @60cf18 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "names"
> @1a4c500 16 STRSXP g0c2 [] (len=2, tl=0)
> @674e88 09 CHARSXP g0c1 [MARK,gp=0x21] "x"
> @728c38 09 CHARSXP g0c1 [MARK,gp=0x21] "y"
>
> Martin
>
>> a table of size 754KB seems still to be pretty large?
>>
>> Best regards
>> Christian
>>
>>
>> On 7/12/10 11:44 PM, Martin Morgan wrote:
>>> On 07/12/2010 01:45 PM, cstrato wrote:
>>>> Dear all,
>>>>
>>>> With great interest I followed the discussion:
>>>> https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
>>>> since I have currently a similar problem:
>>>>
>>>> In a new R session (using xterm) I am importing a simple table
>>>> "Hu6800_ann.txt" which has a size of 754KB only:
>>>>
>>>>> ann<- read.delim("Hu6800_ann.txt")
>>>>> dim(ann)
>>>> [1] 7129 11
>>>>
>>>>
>>>> When I call "object.size(ann)" the estimated memory used to store "ann"
>>>> is already 2MB:
>>>>
>>>>> object.size(ann)
>>>> 2034784 bytes
>>>>
>>>>
>>>> Now I call "split()" and check the estimated memory used which turns out
>>>> to be 3.3GB:
>>>>
>>>>> u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>>>>> object.size(u2p)
>>>> 3323768120 bytes
>>>
>>> I guess things improve with stringsAsFactors=FALSE in read.delim?
>>>
>>> Martin
>>>
>>>>
>>>> During the R session I am running "top" in another xterm and can see
>>>> that the memory usage of R increases to about 550MB RSIZE.
>>>>
>>>>
>>>> Now I do:
>>>>
>>>>> object.size(unlist(u2p))
>>>> 894056 bytes
>>>>
>>>> It takes about 3 minutes to complete this call and the memory usage of R
>>>> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this
>>>> function the free RAM of my Mac decreases to less than 8MB free PhysMem,
>>>> until it needs to swap memory. When finished, free PhysMem is 734MB but
>>>> the size of R increased to 577MB RSIZE.
>>>>
>>>> Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not
>>>> change the object.size, only processing was faster and it did use less
>>>> memory on my Mac.
>>>>
>>>> Do you have any idea what the reason for this behavior is?
>>>> Why is the size of list "u2p" so large?
>>>> Do I make any mistake?
>>>>
>>>>
>>>> Here is my sessionInfo on a MacBook Pro with 2GB RAM:
>>>>
>>>>> sessionInfo()
>>>> R version 2.11.1 (2010-05-31)
>>>> x86_64-apple-darwin9.8.0
>>>>
>>>> locale:
>>>> [1] C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> Best regards
>>>> Christian
>>>> _._._._._._._._._._._._._._._._._._
>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>> e.m.a.i.l: cstrato at aon.at
>>>> _._._._._._._._._._._._._._._._._._
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>
>
More information about the R-devel
mailing list