[R] Fastest way to repeatedly subset a data frame?

Iestyn Lewis ilewis at pharm.emory.edu
Fri Apr 20 22:01:04 CEST 2007


Good tip - an Rprof trace over my real data set resulted in a file 
filled with:

pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
pmatch [.data.frame [ FUN lapply
...
with very few other calls in there.  pmatch seems to be the string 
search function, so I'm guessing there's no hashing going on, or not 
very good hashing.

I'll let you know how the environment option works - the Bioconductor 
project seems to make extensive use of it, so I'm guessing it's the way 
to go.

Iestyn

hadley wickham wrote:
>> But... it's not any faster, which is worrisome to me because it seems
>> like your code uses rownames and would take advantage of the hashing
>> potential of named items.
>
> I'm pretty sure it will use a hash to access the specified rows.
> Before you pursue an environment based solution, you might want to
> profile the code to check that the hashing is actually the slowest
> part - I suspect creating all new data.frames is taking the most time.
>
> Hadley



More information about the R-help mailing list