[R] Resources for optimizing code
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Nov 5 19:04:55 CET 2004
On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:
>
> I want to eliminate certain observations in a large dataframe (21000x100).
> I have written code which does this using a binary vector (0=delete obs,
> 1=keep), but it uses for loops, and so it's slow and in the extreme it
> causes R to hang for indefinite time periods.
>
> I'm looking for one of two things:
> 1. A document which discusses how to avoid for loops and situations in
> which it's impossible to avoid for loops.
`S Programming': see the FAQ.
But at the level of the example below, chapter 2 of MASS4 (FAQ again for
details).
> or
>
> 2. A function which can do the above better than mine.
>
> My code is pasted below.
>
> Thanks so much,
>
> Janet
>
> # asst is a binary vector of length= nrow(DATAFRAME).
> # 1= observations you want to keep. 0= observation to get rid of.
How about DATAFRAME[asst == 1, ] ?
I am not sure if asst has NAs in, but if it has you will get an error from
if (asst[i]==1)
and if not, you don't need na.rm=T.
> DF <- as.data.frame(matrix(rnorm(21000*100),, 100))
> asst <- rbinom(21000, 1, 0.7)
> DF2 <- DF[asst==1,]
where the subsetting took less than a second for me.
Note that your code converts DATAFRAME to a matrix. If that is reasonable
(e.g. it is all numeric), then matrix indexing will be faster.
> remove.xtra.f <-function(asst, DATAFRAME) {
> n<-sum(asst, na.rm=T)
> newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME))
> j<-1
> for(i in 1:length(data)) {
> if (asst[i]==1) {
> newdata[j,]<-DATAFRAME[i,]
> j<-j+1
> }
> }
> newdata.f<-as.data.frame(newdata)
> names(newdata.f)<-names(DATAFRAME)
> return(newdata.f)
> }
> --
> Janet Rosenbaum jerosenb at fas.harvard.edu
> PhD Candidate in Health Policy, Harvard GSAS
> Harvard Injury Control Research Center, Harvard School of Public Health
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list