[R] algorithm to create unique identifiers

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Sep 5 09:12:07 CEST 2008


For a much simpler solution that does always work for numbers, see 
unique's methods for matrices and data frames.

On Thu, 4 Sep 2008, Henrik Bengtsson wrote:

> On Thu, Sep 4, 2008 at 8:44 PM, Ralph S. <ruffel1 at hotmail.com> wrote:
>>
>> Hi all,
>>
>> I am trying to create a unique identifier for each row, combining 
>> numbers from three columns.
>>
>> Do you know if there is a general formula to do this (or some manual 
>> where I can read about this)?
>>
>> I figure I can use the numeric entries of the columns as "coordinates" 
>> and multiply them with different coefficients (different magnitudes) to 
>> get the unique ID - but it would be nice to read about such algorithms 
>> in general.
>
> What are you numbers?  Are they in a fixed range?  Integers or reals?
> If fixed range integers, it is easy.  Think regular numerical
> representation, e.g. binary, octadecimal, decimal and hexadecimal.
>
> For a more generic solution that works with any data types, see e.g.
> MD5 [http://en.wikipedia.org/wiki/MD5].  It is not guaranteed to
> generated unique codes, but it is extremely rare that two different
> inputs gives the same MD5 code.  MD5 (and others) are implemented in
> the 'digest' packages, e.g.
>
>> library(digest)
>> digest(list(a=1, b=list(1:10, c=letters)))
> [1] "73e0ae066a97bfff7f79d41c65b55fde"
>
> My $.02
>
> /Henrik
>
>
>>
>> Any links/input would be great -
>>
>> Ralph

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list