[R] Why is vector assignment in R recreates the entire vector ?
Matt Shotwell
shotwelm at musc.edu
Wed Sep 1 18:19:34 CEST 2010
Tal,
For your first example, x is not duplicated in memory. If you compile R
with --enable-memory-profiling, you have access to the tracemem()
function, which will report whether x is duplicate()d:
> x <- rep(1,100)
> tracemem(x)
[1] "<0x8f71c38>"
> x[10] <- NA
This does not result in duplication of x, nor does assignment of x to y:
> y <- x
At this point, y internally references x. It's not until we modify y,
that x is duplicated, and y gets its own copy of the data:
> y[10] <- NA
tracemem[0x8f71c38 -> 0x91fff70]:
Likewise, no duplication occurs using `[<-`:
> x <- rep(1,100)
> tracemem(x)
[1] "<0x8e44900>"
> x <- `[<-`(x, list=10, values=NA)
But, R is not yet smart enough to avoid a duplication here:
> x <- rep(1,100)
> tracemem(x)
[1] "<0x915d580>"
> x <- replace(x, list=10, values=NA)
tracemem[0x915d580 -> 0x915e090]: replace
Beyond these simple tests, it's difficult to know when R copies memory.
I mentioned in another post recently that subsetting a vector will copy
memory, but this is not reported by tracemem(). For example:
> tracemem(x)
[1] "<0x915ed50>"
> y <- x[1:100]
> tracemem(y)
[1] "<0x915f3f0>"
> identical(x,y)
[1] TRUE
Fortunately, memory is fairly cheap, and memory operations are pretty
fast in modern operating systems, like GNU Linux. I mostly find that the
rate limiting steps in my code are computational routines, like exp().
-Matt
On Wed, 2010-09-01 at 11:09 -0400, Tal Galili wrote:
> Hello all,
>
> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed.
>
> So for example, the code:
> x[10]<- NA # The original call (short version)
>
> Is really doing this:
> x<- replace(x, list=10, values=NA) # The original call (long version)
> # assigning a whole new vector to x
>
> Which is actually doing this:
> x<- `[<-`(x, list=10, values=NA) # The actual call
>
>
> Assuming this can be explained reasonably to the lay man, my question is,
> why is it done this way ?
> Why won't it just change the relevant pointer in memory?
>
> On small vectors it makes no difference.
> But on big vectors this might be (so I suspect) costly (in terms of time).
>
>
> I'm curious for your responses on the subject.
>
> Best,
> Tal
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com | 972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
More information about the R-help
mailing list