[R] Format integer

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 13 08:12:39 CEST 2008


This is one of those problems where the fine details matter.

1) The version of R.  I optimized sprintf() for long inputs and a single 
format in R 2.7.0 -- the differences are mainly for multiple inputs and 
where coercion is needed.  See also below.

2) The system.  My home system with an Intel Core 2 Duo is usually about 
the same speed as my office desktop with dual Opterons.  But not here:

Home:

> system.time(a<-formatC(x,digits=10,flag='0'))
    user  system elapsed
   9.705   0.088   9.810
>  system.time(b<-sprintf("%011d",x))
    user  system elapsed
   0.283   0.000   0.283

Office:

> system.time(a<-formatC(x,digits=10,flag='0'))
    user  system elapsed
  15.851   0.125  16.007
> system.time(b<-sprintf("%011d",x))
    user  system elapsed
   0.816   0.001   0.818

and my Windows laptop is similar to the second here.  So a speed-up of 
95x seems atypical.

On Mon, 12 May 2008, Phil Spector wrote:

> I guess "little" means different things to different people:
>
>> x = sample(1:100,650000,replace=TRUE)
>> system.time(a<-formatC(x,digits=10,flag='0'))
>   user  system elapsed
> 32.854   0.444  34.813
>> system.time(b<-sprintf("%011d",x))
>   user  system elapsed
>  0.352   0.012   0.363
>
> If you look at the definitions of the functions, you'll see
> that formatC is written in R, and sprintf uses a single call
> to an .Internal function.   I

Not really: the meat of formatC() is a .C call.  In this case it is 
calling format.default(), also a .Internal.  But profiling shows that most 
of the time here is spent in paste(), another function which was optimized 
in 2.7.0. (I see 2.7.0 as 1.7x faster than 2.6.2 on formatC here.)

But although sprintf is more flexible, on most problems it will be 
substantially faster.




>                                       - Phil Spector
> 					 Statistical Computing Facility
> 					 Department of Statistics
> 					 UC Berkeley
> 					 spector at stat.berkeley.edu
>
>
>
> On Mon, 12 May 2008, Anh Tran wrote:
>
>> Yea, thanks all. I checked back and I got a few things mistyped.
>> The array is 650,000 and it took 25 seconds :p. It's acceptable. Just that 
>> I
>> had too many variable at the time I ran it.
>> 
>> Also, seems like sprintf is a little faster.
>> 
>> Thanks all.
>> 
>> Anh Tran
>> 
>> 
>> On Mon, May 12, 2008 at 2:55 PM, Uwe Ligges 
>> <ligges at statistik.tu-dortmund.de>
>> wrote:
>> 
>>> 
>>> 
>>> Anh Tran wrote:
>>> 
>>>> Thanks. formatC(flag) works.
>>>> 
>>>> But it's awefully slow. I try to do that for 65000 numbers (generating
>>>> ID
>>>> for each item) and it seems like forever.
>>>> 
>>> 
>>> On my not that recent laptop:
>>> 
>>>> system.time(formatC(1:65000, width=10, flag="0"))
>>>   user  system elapsed
>>>   1.92    0.00    1.94
>>> 
>>> 
>>> I think 2 seconds is less than "forever".
>>> 
>>> Uwe Ligges
>>> 
>>> 
>>> 
>>> 
>>> 
>>>
>>>  Is there any faster way?
>>>> 
>>>> Thank all.
>>>> 
>>>> Anh Tran
>>>> 
>>>> On Mon, May 12, 2008 at 2:36 PM, Uwe Ligges <
>>>> ligges at statistik.uni-dortmund.de> wrote:
>>>> 
>>>> 
>>>>> Anh Tran wrote:
>>>>>
>>>>>  Hi,
>>>>>> What's one way to convert an integer to a string with preceding 0's?
>>>>>> such that
>>>>>> '13' becomes '00000000013'
>>>>>> to be put into a string
>>>>>> 
>>>>>> I've tried formatC, but they removes all the zeros and replace it
>>>>>> with
>>>>>> blanks
>>>>>>
>>>>>>  Not so for me:
>>>>> 
>>>>> formatC(13, digits=10, flag="0")
>>>>> 
>>>>> Uwe LIgges
>>>>> 
>>>>> 
>>>>>
>>>>>  Thanks
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>> -- 
>> Regards,
>> Anh Tran
>>
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list