[R] rounding down with as.integer
Mike Miller
mbmiller+l at gmail.com
Thu Jan 1 20:58:02 CET 2015
I'd have to say thanks, but no thanks, to that one! ;-) The problem is
that it will take a long time and it will give the same answer.
The first time I did this kind of thing, a year or two ago, I manipulated
the text data to produce integers before putting the data into R. The
data were a little different -- already zero padded with three digits to
the right of the decimal and one to the left, so all I had to do was drop
the decimal point. The as.integer(1000*x+.5) method is very fast and it
works great.
I could have done that this time, but I was also saving to other formats,
so I had the data already in the format I described.
Mike
On Thu, 1 Jan 2015, Richard M. Heiberger wrote:
> Interesting. Following someone on this list today the goal is input
> the data correctly.
> My inclination would be to read the file as text, pad each number to
> the right, drop the decimal point,
> and then read it as an integer.
> 0 1 2 0.325 1.12 1.9
> 0.000 1.000 2.000 0.325 1.120 1.900
> 0000 1000 2000 0325 1120 1900
>
> The pad step is the interesting step.
>
> ## 0 1 2 0.325 1.12 1.9
> ## 0.000 1.000 2.000 0.325 1.120 1.900
> ## 0000 1000 2000 0325 1120 1900
>
> x.in <- scan(text="
> 0 1 2 0.325 1.12 1.9 1.
> ", what="")
>
> padding <- c(".000", "000", "00", "0", "")
>
> x.pad <- paste(x.in, padding[nchar(x.in)], sep="")
>
> x.nodot <- sub(".", "", x.pad, fixed=TRUE)
>
> x <- as.integer(x.nodot)
>
>
> Rich
>
>
> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>>
>>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>>
>>>>
>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>>
>>>>> This is probably a FAQ, and I don't really have a question about it, but
>>>>> I just ran across this in something I was working on:
>>>>>
>>>>>> as.integer(1000*1.003)
>>>>>
>>>>> [1] 1002
>>>>>
>>>>> I didn't expect it, but maybe I should have. I guess it's about the
>>>>> machine precision added to the fact that as.integer always rounds down:
>>>>>
>>>>>
>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>>
>>>>> [1] 1002
>>>>>
>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>>
>>>>> [1] 1003
>>>>>
>>>>>
>>>>> This does it right...
>>>>>
>>>>>> as.integer( round( 1000*1.003 ) )
>>>>>
>>>>> [1] 1003
>>>>>
>>>>> ...but this seems to always give the same answer and it is a little
>>>>> faster in my application:
>>>>>
>>>>>> as.integer( 1000*1.003 + .1 )
>>>>>
>>>>> [1] 1003
>>>>>
>>>>>
>>>>> FYI - I'm reading in a long vector of numbers from a text file with no
>>>>> more than three digits to the right of the decimal. I'm converting them to
>>>>> integers and saving them in binary format.
>>>>>
>>>>
>>>> So just add 0.0001 or even .0000001 to all of them and coerce to integer.
>>>
>>>
>>> I don't think the original problem was stated clearly, so I'm not sure
>>> whether this is a solution, but it looks wrong to me. If you want to round
>>> to the nearest integer, why not use round() (without the as.integer
>>> afterwards)? Or if you really do want an integer, why add 0.1 or 0.0001,
>>> why not add 0.5 before calling as.integer()? This is the classical way to
>>> implement round().
>>>
>>> To state the problem clearly, I'd like to know what result is expected for
>>> any real number x. Since R's numeric type only approximates the real
>>> numbers we might not be able to get a perfect match, but at least we could
>>> quantify how close we get. Or is the input really character data? The
>>> original post mentioned reading numbers from a text file.
>>
>>
>>
>> Maybe you'd like to know what I'm really doing. I have 1600 text files each
>> with up to 16,000 lines with 3100 numbers per line, delimited by a single
>> space. The numbers are between 0 and 2, inclusive, and they have up to
>> three digits to the right of the decimal. Every possible value in that
>> range will occur in the data. Some examples numbers: 0 1 2 0.325 1.12 1.9.
>> I want to multiply by 1000 and store them as 16-bit integers (uint16).
>>
>> I've been reading in the data like so:
>>
>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>
>>
>> At first I tried making the integers like so:
>>
>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm
>>
>> user system elapsed
>> 0.187 0.387 0.574
>>
>> I decided I should compare with the result I got using round():
>>
>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ;
>>> proc.time()-ptm
>>
>> user system elapsed
>> 1.595 0.757 2.352
>>
>> It is a curious fact that only a few of the values from 0 to 2000 disagree
>> between the two methods:
>>
>>> table( ints2[ ints2 != ints ] )
>>
>>
>> 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 1021 1023
>> 35651 27020 15993 11505 8967 7549 6885 6064 5512 4828 4533 4112
>>
>> I understand that it's all about the problem of representing digital numbers
>> in binary, but I still find some of the results a little surprising, like
>> that list of numbers from the table() output. For another example:
>>
>>> 1000+3 - 1000*(1+3/1000)
>>
>> [1] 1.136868e-13
>>
>>> 3 - 1000*(0+3/1000)
>>
>> [1] 0
>>
>>> 2000+3 - 1000*(2+3/1000)
>>
>> [1] 0
>>
>> See what I mean? So there is something special about the numbers around
>> 1000.
>>
>> Back to the quesion at hand: I can avoid use of round() and speed things up
>> a little bit by just adding a small number after multiplying by 1000:
>>
>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ;
>>> proc.time()-ptm
>>
>> user system elapsed
>> 0.224 0.594 0.818
>>
>> You point out that adding .5 makes sense. That is probably a better idea
>> and I should take that approach under most conditions, but in this case we
>> can add anything between 2e-13 and about 0.99999999999 and always get the
>> same answer. We also have to remember that if a number might be negative
>> (not a problem for me in this application), we need to subtract 0.5 instead
>> of adding it.
>>
>> Anyway, right now this is what I'm actually doing:
>>
>>> con <- file( paste0(FILE, ".uint16"), "wb" )
>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE,
>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm
>>
>> Read 48013406 items
>> user system elapsed
>> 10.263 0.733 10.991
>>>
>>> close(con)
>>
>>
>> By the way, writeBin() is something that I learned about here, from you,
>> Duncan. Thanks for that, too.
>>
>> Mike
>>
>> --
>> Michael B. Miller, Ph.D.
>> University of Minnesota
>> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list