[Rd] A memory management question
Luke Tierney
luke at stat.uiowa.edu
Mon Sep 5 21:02:14 CEST 2005
On Mon, 5 Sep 2005, dhinds at sonic.net wrote:
> Luke Tierney <luke at stat.uiowa.edu> wrote:
>
>> It might or might not work now but is not guaranteed to do so reliably
>> in the future. Seeing the risks of leaving SETLENGTH exposed, it is
>> very likely that SETLENGTH will be removed from the sources after the
>> 2.2.0 release.
>
>> If you provide your own methods to read and write the external pointer
>> then you don' need this; this is safer than relying on undocumented
>> behavior of [ and [<- in any case. You also then don't need to use
>> R_PreserveObject unless you really need to use it from the C level
>> outside of a context where an R reference exists.
>
> I'm not sure I follow this. Maybe I should explain the context for
> the problem.
>
> textConnection("xyz", "w") creates a connection, the output of which
> is deposited in a char vector named "xyz", which is updated line by
> line as output is sent to the connection. The current code maintains
> a pointer to "xyz" in the form of an unprotected SEXP. Hence if the
> user does rm(xyz), bad things happen. A small bug, I admit.
>
> I think the best fix is to use a protected reference to the result
> vector. I think this is safe and doesn't rely on any abuse of the
> interfaces.
>
> There's also a performance issue, that the result is updated after
> every line of output, resulting in a vast amount of copying if a large
> result is accumulated. This is the part that could be fixed by using
> SETLENGTH to manage the length of the protected result vector.
>
> I'm not sure what you mean by undocumented behavior of [ and [<-. I
> think all I'm relying on is that as long as an outstanding reference
> to the result vector exists, that R has to make sure the reference
> remains valid, and hence can't change the memory allocation of the
> result vector in any way. I don't care what else happens to the
> contents of the vector, as long as I get to control when it is
> released. It is ok with me if the user modifies the result vector
> in-place, since my reference stays valid. So I don't actually care
> how [ and [<- work.
It would have helped to explain what you are up to. I had to guess
and guessed wrong, so forget the [ and [<- issue for now.
> I think the only undocumented thing I'm relying on, is that the memory
> manager doesn't pay attention to the LENGTH of objects that it isn't
> actively doing anything to. Currently, it actually only uses LENGTH
> in one spot: for updating R_LargeVallocSize when a large vector is
> released. The true allocation sizes for individual objects are always
> kept in another place (either by malloc, or in the node class of the
> object).
>
> It seems like in this limited usage, SETLENGTH does represent a useful
> feature, by permitting safe over-allocation of a protected object, and
> might be worth preserving (and documenting) for that purpose.
I am not comfortable making this available at this point. It might be
useful to have but would need careful thought. Without some way to
find out the true length there are potential problems. Without some
way of making sure the fields in VECSXP and STRSXP that are added are
valid there are potential problems (not the first time but if the size
is shrunk and then increased). Not that this can't be resolved but it
would take time that I don't have now, and this isn't high priority
enough to schedule in the near future. So for now you should not use
SETLENGTH if you want your code to work beyond 2.2.0.
> Of course, the real problem here is the semantics of textConnection(),
> which make life much more difficult and can't be changed because they
> are specified outside of R.
It may be possible to expand the semantics by adding a logical
argument that controls whether the vector is to be over-allocated and
filled with zero length strings and truncated to the true length on
close. Another variant would be to have a logical argument that says
to keep the input internally and provide a function, say
textConnectionOutput, to retrieve the internal output. I would then
use a linked list internally. The semantics of close complicate this
a bit; this function would probably need to optionally close the
connection to get a final complete line.
luke
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list