[Rd] Memory allocation in read.table
Simon Urbanek
simon.urbanek at r-project.org
Wed Aug 28 19:44:57 CEST 2013
On Aug 28, 2013, at 12:17 PM, Hadley Wickham wrote:
> Hi all,
>
> I've been trying to learn more about memory profiling in R and I've
> been trying memory profiling out on read.table. I'm getting a bit of a
> strange result, and I hope that someone might be able to explain why.
>
> After running
>
> Rprof("read-table.prof", memory.profiling = TRUE, line.profiling = TRUE,
> gc.profiling = TRUE, interval = interval)
> diamonds <- read.table("diamonds.csv", sep = ",", header = TRUE)
> Rprof(NULL)
>
> and doing an lot of data manipulation, I end up with a table that
> displays the total memory (in megabytes) allocated and released (by
> gc) from each line of (a local copy of) read.table:
>
> file line alloc release
> 1 read-table.r 122 1.9797 1.1435
> 2 read-table.r 165 1.1148 0.6511
> 3 read-table.r 221 0.0763 0.0321
> 4 read-table.r 222 0.4922 1.5057
>
> Lines 122 and 165 are where I expect to see big allocations and
> releases - they're calling scan and convert.type respectively. Lines
> 221 and 222 are more of a mystery:
>
> class(data) <- "data.frame"
> attr(data, "row.names") <- row.names
>
> Why do those lines need any allocations? I thought class<- and attr<-
> were primitives, and hence would modify in place.
>
.. but only if there is no other reference to the data (i.e. NAMED < 2). If there are two references, they have to copy, because it would change the other copy.
Here, however, it already has NAMED=2 because of
data <- data[keep]
If you remove that line and inverse the order of class() and attr()<- then you get 0 copies.
Cheers,
Simon
PS: if you are loading any sizable data, the one thing you don't want to do is to use read.table() ;)
> Re-running with gctorture(TRUE) yields roughly similar numbers,
> although there is no memory release because gc is called earlier, and
> the assignment of allocations to line is probably more accurate given
> that gctorture runs the code about 20x slower:
>
> file line alloc release
> 25 read-table.r 221 0.387299 0.00e+00
> 26 read-table.r 222 0.362964 0.00e+00
>
> The whole object, when loaded, is ~4 meg, so those allocations
> represent fairly sizeable chunks of the total.
>
> Any suggestions would be greatly appreciated. Thanks!
>
> Hadley
>
> --
> Chief Scientist, RStudio
> http://had.co.nz/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list