[R] speeding up functions for large datasets
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Aug 6 10:07:32 CEST 2004
On Fri, 6 Aug 2004 Freja.Vamborg at astrazeneca.com wrote:
> Dear R-helpers,
> I'm dealing with large datasets, say tables of 60 000 times 12 or so, and
> some of the functions are (too ) slow and I'm therefore trying to find ways
> to speed them up.
> I've found that for instance for-loops are slow in R (both by testing and by
> searching through mail archives etc )
I don't think that is really true, but it is the case that using
row-by-row operations in your situation would be slow *if they are
unnecessary*. It is a question of choosing the right algorithmic approach,
not whether it is implemented by for-loops or lapply or ....
> Are there any more well known arguments that are slow in R, ,maybe at data
> representation level, code-writing, reading in the data.
> I've also tried incorporating C-code, which works well, but I'd also like to
> find other, maybe more "shortcut" ways.
`S Programming' (see the R FAQ) has a whole chapter on this sort of thing,
with examples. More generally you want to take a `whole object' view and
use indexing and other vectorized operations.
Note also that what is slow does change with the version of R and
especially how much memory you have installed. The first step is to get
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help