[R] Indexes on dataframe columns?

Duncan Murdoch murdoch at stats.uwo.ca
Thu Oct 25 16:10:37 CEST 2007


On 10/25/2007 9:27 AM, Ranjan Bagchi wrote:
> Hi --
> 
> I'm working with some data frames with fairly high nrows (call it 8 
> columns, by 20,000 rows).  Are there any indexes on these columns?
> 
> When I do a df[df$foo == 42,] [which I think is idiomatic], am I doing a linear 
> search or something better?  If the column contents is ordered, I'd like 
> to at least be doing a naive binary search.

You're not doing a search at all:   you are calculating a vector of TRUE 
and FALSE values, then selecting the rows corresponding to TRUE values. 
  No optimization is done, so it doesn't matter if the values are unique 
or sorted.

20,000 rows is not a particularly large number nowadays, so this may be 
reasonable.    I believe you'll get a fast search if the foo column is 
used as row names, but you'll need to time it to be sure.  Then the 
indexing would be df["42", ].

If it's still too slow, I'd advise against using data frames.  Matrix 
indexing is much faster.

Duncan Murdoch



More information about the R-help mailing list