[Rd] Writing character vectors with embedded nulls to a connection

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Mar 31 18:48:35 CEST 2006


The following approach

sobject <- charToRaw(serialize(object,NULL))
len <- length(sobject)
writeBin(sobject, outcon)

would appear to work.  As from 2.3.0 you will then be able to do

unserialize(readBin(incon, "raw", n=len))


On Fri, 31 Mar 2006, Prof Brian Ripley wrote:

> I think you should be using a raw type to hold such data in R.  It is not 
> intentional that readChar handles embedded nuls (and in fact it might not in 
> an MBCS).
>
> As ?serialize says
>
>     For 'serialize', 'NULL' unless 'connection=NULL', when the result
>     is stored in the first element of a character vector (but is not a
>     normal character string unless 'ascii = TRUE' and should not be
>     processed except by 'unserialize').
>
> so you have been told this is not intended to work as you tried.
>
> serialize predates the raw type, or it would have made use of it.  In these 
> days of MBCS character strings it is increasingly unsafe to use them to hold 
> anything other than valid character data.
>
>
> On Thu, 30 Mar 2006, Jeffrey Horner wrote:
>
>> Is this possible? I've tried both writeChar() and writeBin() to no avail.
>> 
>> My goal is to serialize(ascii=FALSE) an object to a connection but
>> determine the size of the serialized object before hand:
>> 
>> sobject <- serialize(object,NULL,ascii=FALSE)
>> len <- nchar(sobject)
>> #
>> # run some code here to notify listener on other end of connection
>> # how many bytes I'm getting ready to send
>> #
>> writeChar(sobject,con)
>> 
>> The other option is to serialize twice:
>> 
>> len <- nchar(serialize(object,NULL,ascii=FALSE))
>> #
>> # run some code here to notify listener on other end of connection
>> # how many bytes I'm getting ready to send
>> #
>> serialize(object,con,ascii=FALSE)
>> 
>> Object stores, like memcache (http://danga.com/memcached/), need to know
>> object sizes before storing. RDBMS's which support large objects (CLOBS
>> or BLOBS) don't nececarilly need to know object sizes before-hand, but
>> they do have max column size limits which must be honored.
>> 
>> BTW, readchar() can read strings with embedded nulls; I figured
>> writeChar() should be able to write them.
>> 
>> 
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list