[R] Loops and dataframes

Liaw, Andy andy_liaw at merck.com
Fri Feb 25 12:33:51 CET 2005


An addendum:  If you must use a data frame (e.g., you have mixed data
types), the following might help:

> df <- list(start=st, end=ed)
> system.time({for (i in 1:length(df[[1]])) df$start[i] <- df$end[i];
+              df <- as.data.frame(df)}, gcFirst=TRUE)
[1] 0.14 0.01 0.15   NA   NA

I.e., keep it as a list until all manipulations are done, then coerce to
data frame.


Andy 


> From: Liaw, Andy
> 
> You are discovering part of the overhead of using a data 
> frame.  The way you
> specify the subset of data frame to replace matters somewhat:
> 
> > st <- rep(1,1e4)
> > ed <- rep(2,1e4)
> > df <- data.frame(start=st, end=ed)
> > system.time(for (i in 1:dim(df)[1]) df[i,1] <- df[i,2], 
> gcFirst=TRUE)
> [1] 35.96  0.10 36.37    NA    NA
> > df <- data.frame(start=st, end=ed)
> > system.time(for (i in 1:dim(df)[1]) df[[1]][i] <- df[[2]][i],
> gcFirst=TRUE)
> [1] 22.63  0.17 22.88    NA    NA
> > df <- data.frame(start=st, end=ed)
> > system.time(for (i in 1:dim(df)[1]) df$start[i] <- df$end[i],
> gcFirst=TRUE)
> [1] 19.29  0.13 19.46    NA    NA
> 
> 
> If you have all numeric data, you might as well use a matrix 
> instead of data
> frame:
> 
> > m <- cbind(start=st, end=ed)
> > str(m)
>  num [1:10000, 1:2] 2 2 2 2 2 2 2 2 2 2 ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : NULL
>   ..$ : chr [1:2] "start" "end"
> > system.time(for (i in 1:nrow(df)) m[i,1] <- m[i,2], gcFirst=TRUE)
> [1] 0.06 0.00 0.08   NA   NA
> 
> 
> Andy
> 
> 
> > From: Firas Swidan
> > 
> > Hi,
> > I am experiencing a long delay when using dataframes inside 
> > loops and was
> > wordering if this is a bug or not.
> > Example code:
> > 
> > > st <- rep(1,100000)
> > > ed <- rep(2,100000)
> > > for(i in 1:length(st)) st[i] <- ed[i] # works fine
> > > df <- data.frame(start=st,end=ed)
> > > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever
> > 
> > R: R 2.0.0 (2004-10-04)
> > OS: Linux, Fedora Core 2
> > kernel: 2.6.10-1.14_FC2
> > cpu: AMD Athlon XP 1600.
> > mem: 500MB.
> > 
> > The example above is only to illustrate the problem. I need 
> > loops to apply
> > some functions on pairs (not necessarily successive) of rows in a
> > dataframe.
> > 
> > Thankful for any advices,
> > Firas.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>




More information about the R-help mailing list