[R] rounding down with as.integer

Fri Jan 2 00:37:03 CET 2015

Yes, Ted, that also works, but it's very slow:

# read in values:
> data <- scan( file=RECIP_IN, what=double(), nmax=recip_N*16000)
Read 48013406 items

# convert to integer by adding .5 and rounding down:
> ptm <- proc.time() ; ints <- as.integer( 1000 * data + .5 ) ; proc.time()-ptm
    user  system elapsed
   0.221   1.008   1.227

# convert to character, then to integer:
> ptm <- proc.time() ; ints2 <- as.integer( as.character( 1000 * data ) ) ; proc.time()-ptm
    user  system elapsed
  32.110   0.485  32.578

# the results are the same:
> identical(ints,ints2)
[1] TRUE

So they give the same answer, but converting to character takes about 25 
times longer.

Mike

On Thu, 1 Jan 2015, Ted.Harding at wlandres.net wrote:

> I've been followeing this little tour round the murkier bistros
> in the back-streets of R with interest! Then it occurred to me:
> What is wrong with [using example data]:
>
>  x0 <- c(0,1,2,0.325,1.12,1.9,1.003)
>  x1 <- as.integer(as.character(1000*x0))
>  n1 <- c(0,1000,2000,325,1120,1900,1003)
>
>  x1 - n1
>  ## [1] 0 0 0 0 0 0 0
>
>  ## But, of course:
>  1000*x0 - n1
>  ## [1]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
>  ## [5]  0.000000e+00  0.000000e+00 -1.136868e-13
>
> Or am I missing somthing else in what Mike Miller is seeking to do?
> Ted.
>
> On 01-Jan-2015 19:58:02 Mike Miller wrote:
>> I'd have to say thanks, but no thanks, to that one!  ;-)  The problem is
>> that it will take a long time and it will give the same answer.
>>
>> The first time I did this kind of thing, a year or two ago, I manipulated
>> the text data to produce integers before putting the data into R.  The
>> data were a little different -- already zero padded with three digits to
>> the right of the decimal and one to the left, so all I had to do was drop
>> the decimal point.  The as.integer(1000*x+.5) method is very fast and it
>> works great.
>>
>> I could have done that this time, but I was also saving to other formats,
>> so I had the data already in the format I described.
>>
>> Mike
>>
>>
>> On Thu, 1 Jan 2015, Richard M. Heiberger wrote:
>>
>>> Interesting.  Following someone on this list today the goal is input
>>> the data correctly.
>>> My inclination would be to read the file as text, pad each number to
>>> the right, drop the decimal point,
>>> and then read it as an integer.
>>> 0 1 2 0.325 1.12 1.9
>>> 0.000 1.000 2.000 0.325 1.120 1.900
>>> 0000 1000 2000 0325 1120 1900
>>>
>>> The pad step is the interesting step.
>>>
>>> ## 0 1 2 0.325 1.12 1.9
>>> ## 0.000 1.000 2.000 0.325 1.120 1.900
>>> ## 0000 1000 2000 0325 1120 1900
>>>
>>> x.in <- scan(text="
>>> 0 1 2 0.325 1.12 1.9 1.
>>> ", what="")
>>>
>>> padding <- c(".000", "000", "00", "0", "")
>>>
>>> x.pad <- paste(x.in, padding[nchar(x.in)], sep="")
>>>
>>> x.nodot <- sub(".", "", x.pad, fixed=TRUE)
>>>
>>> x <- as.integer(x.nodot)
>>>
>>>
>>> Rich
>>>
>>>
>>> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>>>> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>>>>
>>>>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>>>>
>>>>>>
>>>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>>>>
>>>>>>> This is probably a FAQ, and I don't really have a question about it, but
>>>>>>> I just ran across this in something I was working on:
>>>>>>>
>>>>>>>> as.integer(1000*1.003)
>>>>>>>
>>>>>>> [1] 1002
>>>>>>>
>>>>>>> I didn't expect it, but maybe I should have.  I guess it's about the
>>>>>>> machine precision added to the fact that as.integer always rounds down:
>>>>>>>
>>>>>>>
>>>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>>>>
>>>>>>> [1] 1002
>>>>>>>
>>>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>>>>
>>>>>>> [1] 1003
>>>>>>>
>>>>>>>
>>>>>>> This does it right...
>>>>>>>
>>>>>>>> as.integer( round( 1000*1.003 ) )
>>>>>>>
>>>>>>> [1] 1003
>>>>>>>
>>>>>>> ...but this seems to always give the same answer and it is a little
>>>>>>> faster in my application:
>>>>>>>
>>>>>>>> as.integer( 1000*1.003 + .1 )
>>>>>>>
>>>>>>> [1] 1003
>>>>>>>
>>>>>>>
>>>>>>> FYI - I'm reading in a long vector of numbers from a text file with no
>>>>>>> more than three digits to the right of the decimal.  I'm converting them
>>>>>>> to
>>>>>>> integers and saving them in binary format.
>>>>>>>
>>>>>>
>>>>>> So just add 0.0001 or even .0000001 to all of them and coerce to integer.
>>>>>
>>>>>
>>>>> I don't think the original problem was stated clearly, so I'm not sure
>>>>> whether this is a solution, but it looks wrong to me.  If you want to
>>>>> round
>>>>> to the nearest integer, why not use round() (without the as.integer
>>>>> afterwards)?  Or if you really do want an integer, why add 0.1 or 0.0001,
>>>>> why not add 0.5 before calling as.integer()?  This is the classical way to
>>>>> implement round().
>>>>>
>>>>> To state the problem clearly, I'd like to know what result is expected for
>>>>> any real number x.  Since R's numeric type only approximates the real
>>>>> numbers we might not be able to get a perfect match, but at least we could
>>>>> quantify how close we get.  Or is the input really character data?  The
>>>>> original post mentioned reading numbers from a text file.
>>>>
>>>>
>>>>
>>>> Maybe you'd like to know what I'm really doing.  I have 1600 text files
>>>> each
>>>> with up to 16,000 lines with 3100 numbers per line, delimited by a single
>>>> space.  The numbers are between 0 and 2, inclusive, and they have up to
>>>> three digits to the right of the decimal.  Every possible value in that
>>>> range will occur in the data.  Some examples numbers: 0 1 2 0.325 1.12 1.9.
>>>> I want to multiply by 1000 and store them as 16-bit integers (uint16).
>>>>
>>>> I've been reading in the data like so:
>>>>
>>>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>>>
>>>>
>>>> At first I tried making the integers like so:
>>>>
>>>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm
>>>>
>>>>    user  system elapsed
>>>>   0.187   0.387   0.574
>>>>
>>>> I decided I should compare with the result I got using round():
>>>>
>>>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ;
>>>>> proc.time()-ptm
>>>>
>>>>    user  system elapsed
>>>>   1.595   0.757   2.352
>>>>
>>>> It is a curious fact that only a few of the values from 0 to 2000 disagree
>>>> between the two methods:
>>>>
>>>>> table( ints2[ ints2 != ints ] )
>>>>
>>>>
>>>>  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
>>>> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112
>>>>
>>>> I understand that it's all about the problem of representing digital
>>>> numbers
>>>> in binary, but I still find some of the results a little surprising, like
>>>> that list of numbers from the table() output.  For another example:
>>>>
>>>>> 1000+3 - 1000*(1+3/1000)
>>>>
>>>> [1] 1.136868e-13
>>>>
>>>>> 3 - 1000*(0+3/1000)
>>>>
>>>> [1] 0
>>>>
>>>>> 2000+3 - 1000*(2+3/1000)
>>>>
>>>> [1] 0
>>>>
>>>> See what I mean?  So there is something special about the numbers around
>>>> 1000.
>>>>
>>>> Back to the quesion at hand:  I can avoid use of round() and speed things
>>>> up
>>>> a little bit by just adding a small number after multiplying by 1000:
>>>>
>>>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ;
>>>>> proc.time()-ptm
>>>>
>>>>    user  system elapsed
>>>>   0.224   0.594   0.818
>>>>
>>>> You point out that adding .5 makes sense.  That is probably a better idea
>>>> and I should take that approach under most conditions, but in this case we
>>>> can add anything between 2e-13 and about 0.99999999999 and always get the
>>>> same answer.  We also have to remember that if a number might be negative
>>>> (not a problem for me in this application), we need to subtract 0.5 instead
>>>> of adding it.
>>>>
>>>> Anyway, right now this is what I'm actually doing:
>>>>
>>>>> con <- file( paste0(FILE, ".uint16"), "wb" )
>>>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE,
>>>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm
>>>>
>>>> Read 48013406 items
>>>>    user  system elapsed
>>>>  10.263   0.733  10.991
>>>>>
>>>>> close(con)
>>>>
>>>>
>>>> By the way, writeBin() is something that I learned about here, from you,
>>>> Duncan.  Thanks for that, too.
>>>>
>>>> Mike
>>>>
>>>> --
>>>> Michael B. Miller, Ph.D.
>>>> University of Minnesota
>>>> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
> Date: 01-Jan-2015  Time: 21:28:22
> This message was sent by XFMail
> -------------------------------------------------
>