[R] R - Problem retrieving memory used after gc() using arrow library

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Wed Aug 16 12:57:11 CEST 2023


On Wed, 16 Aug 2023 11:22:00 +0200
Kévin Pemonon <kevinpemonon using gmail.com> wrote:

> I'd like to understand why there's a difference in memory used
> between the Windows task manager and R's memory.size(max=F) function.

When R was initially ported to Windows, then-popular version of the
system memory allocator was not a good fit for R. R allocates and frees
lots of objects, sometimes small ones. The Windows 95-era allocator had
backwards-compatibility obligations to lots of incorrectly-written
programs that sometimes used memory after freeing it and poked at
undocumented implementation details a lot [*]. I don't know the exact
reasons (my family didn't even have a computer back then), but it seems
that R couldn't make full use of the computer memory without bringing
in its own memory allocator (a copy of Doug Lea's malloc).

It is this particular allocator that memory.size() has access to.
Nowadays, there are many ways for a Windows process to have memory
allocated for it, and not all of them are under control of R. Apache
Arrow, being "a platform for in-memory data", probably allocates its
own memory without asking R to do it. Meanwhile, the Windows
implementation of malloc() has improved, so R-4.2.0 got rid of its own
copy (which also means no more memory.size()).

You are welcome to trust the task manager.

-- 
Best regards,
Ivan

[*] See, e.g., the free bonus chapters to "The Old New Thing" by Raymond
Chen:
https://www.informit.com/content/images/9780321440303/samplechapter/Chen_bonus_ch01.pdf
https://www.informit.com/content/images/9780321440303/samplechapter/Chen_bonus_ch02.pdf



More information about the R-help mailing list