[R] Lookups in R

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Jul 5 01:04:49 CEST 2007


Michael Frumin wrote:
> i wish it were that simple.  unfortunately the logic i have to do on 
> each transaction is substantially more complicated, and involves 
> referencing the existing values of the user table through a number of 
> conditions.
>
> any other thoughts on how to get better-than-linear performance time?  
> is there a recommended binary searching/sorting (i.e. BTree) module that 
> I could use to maintain my own index?
>   
The point remains: To do anything efficient in R, you need to get rid of 
that for loop and use something vectorized. Notice that you can expand 
values from the user table into the transaction table by indexing with 
transactions$userid, or you can use a merge operation.

> thanks,
> mike
>
> Peter Dalgaard wrote:
>   
>> mfrumin wrote:
>>     
>>> Hey all; I'm a beginner++ user of R, trying to use it to do some 
>>> processing
>>> of data sets of over 1M rows, and running into a snafu.  imagine that my
>>> input is a huge table of transactions, each linked to a specif user 
>>> id.  as
>>> I run through the transactions, I need to update a separate table for 
>>> the
>>> users, but I am finding that the traditional ways of doing a table 
>>> lookup
>>> are way too slow to support this kind of operation.
>>>
>>> i.e:
>>>
>>> for(i in 1:1000000) {
>>>    userid = transactions$userid[i];
>>>    amt = transactions$amounts[i];
>>>    users[users$id == userid,'amt'] += amt;
>>> }
>>>
>>> I assume this is a linear lookup through the users table (in which 
>>> there are
>>> 10's of thousands of rows), when really what I need is O(constant 
>>> time), or
>>> at worst O(log(# users)).
>>>
>>> is there any way to manage a list of ID's (be they numeric, string, 
>>> etc) and
>>> have them efficiently mapped to some other table index?
>>>
>>> I see the CRAN package for SQLite hashes, but that seems to be going 
>>> a bit
>>> too far.
>>>   
>>>       
>> Sometimes you need a bit of lateral thinking. I suspect that you could 
>> do it like this:
>>
>> tbl <- with(transactions, tapply(amount, userid, sum))
>> users$amt <- users$amt + tbl[users$id]
>>
>> one catch is that there could be users with no transactions, in which 
>> case you may need to replace userid by factor(userid, 
>> levels=users$id). None of this is tested, of course.
>>     
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list