[R] dealing with NA in readBin() and writeBin()
Mike Miller
mbmiller+l at gmail.com
Sun Jan 4 23:40:13 CET 2015
On Sun, 4 Jan 2015, Duncan Murdoch wrote:
> On 04/01/2015 5:13 PM, Mike Miller wrote:
>> The help doc for readBin writeBin tells me this:
>>
>> Handling R's missing and special (Inf, -Inf and NaN) values is discussed
>> in the ‘R Data Import/Export’ manual.
>>
>> So I go here:
>>
>> http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values
>>
>> Unfortunately, I don't really understand that. Suppose I am using
>> single-byte integers and I want 255 (binary 11111111) to be translated to
>> NA. Is it possible to do that? Of course I could always do something
>> like this:
>>
>> X[ X==255 ] <- NA
>>
>> The problem with that is that I want to process the data on the fly,
>> dividing the integer to produce a double in the range from 0 to 2:
>>
>> X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)/127
>
> Why? Why not do it in three steps, i.e.
>
> X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)
> X[ X==255 ] <- NA
> X <- X/127
>
> If you are worried about the extra typing, then write a function to
> handle all three steps.
The thing I was concerned about is the memory usage, not the typing,
because everything will be scripted. But maybe memory isn't an issue and
I never have to hold two copies in memory simultaneously. There will be
about 50 million elements, typically.
I think in terms of processing numbers that are streaming into memory, but
that might not be what R is doing. For example, with scan() and
na.strings="NA", I picture it changing strings to NA as they are read, it
might load the whole file as character, then do all the work with things
like what=numeric() and na.strings="NA" after the fact. Maybe that
doesn't impose an extra memory burden.
>> It looks like this still works:
>>
>> X[ X==255/127 ] <- NA
>
> I suspect that would work on all current platforms, but I wouldn't trust
> it. Don't use == on floating point values unless you know they are
> fractions with 2^n in the denominator.
Good point about platforms. I was concerned about the use of ==, and
you've convinced me it is not trustworthy.
Thanks very much.
Mike
More information about the R-help
mailing list