[R] Number of replications of a term

Thomas Lumley tlumley at u.washington.edu
Wed Jan 25 01:44:38 CET 2006


On Wed, 25 Jan 2006, Ray Brownrigg wrote:

> There's an even faster one, which nobody seems to have mentioned yet:
>
> rep(l <- rle(ids)$lengths, l)

I considered this but it wasn't clear to me from the initial post that 
each ID occupied a contiguous section of the vector.

Also, lazy evaluation makes code like this
    rep(l <- rle(ids)$lengths, l)
a bit worrying. It relies on rep() using the first argument before it uses 
the second one.  In this case, clearly, it works, but it is not a style I 
would encourage and it's easy to construct functions where it fails.

 	-thomas



> Timing on my 2.8GHz NetBSD system shows:
>
>> length(ids)
> [1] 45150
>> # Gabor:
>> system.time(for (i in 1:100) ave(as.numeric(factor(ids)), ids, FUN =
> length))
> [1] 3.45 0.06 3.54 0.00 0.00
>> # Barry (and others I think):
>> system.time(for (i in 1:100) table(ids)[ids])
> [1] 2.13 0.05 2.20 0.00 0.00
>> Me:
>> system.time(for (i in 1:100) rep(l <- rle(ids)$lengths, l))
> [1] 1.60 0.00 1.62 0.00 0.00
>
> Of course the difference between 21 milliseconds and 16 milliseconds is
> not great, unless you are doing this a lot.
>
> Ray Brownrigg
>
>> From: Gabor Grothendieck <ggrothendieck at gmail.com>
>>
>> Nice.  I timed it and its much faster than mine too.
>>
>> On 1/24/06, Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> wrote:
>>> Laetitia Marisa wrote:
>>>> Hello,
>>>>
>>>> Is there a simple and fast function that returns a vector of the number
>>>> of replications for each object of a vector ?
>>>> For example :
>>>> I have a vector of IDs :
>>>> ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5")
>>>>
>>>>  I want the function returns the following vector where each term is the
>>>> number of replicates for the given id :
>>>> c( 1, 2, 2, 3,3,3,1 )
>>>
>>> One-liner:
>>>
>>> > table(ids)[ids]
>>> ids
>>> ID1 ID2 ID2 ID3 ID3 ID3 ID5
>>>   1   2   2   3   3   3   1
>>>
>>>  'table(ids)' computes the counts, then the subscripting [ids] looks it
>>> all up.
>>>
>>>  Now try it on your 40,000-long vector!
>>>
>>> Barry
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list