[R] Loops and dataframes
Liaw, Andy
andy_liaw at merck.com
Fri Feb 25 12:33:51 CET 2005
An addendum: If you must use a data frame (e.g., you have mixed data
types), the following might help:
> df <- list(start=st, end=ed)
> system.time({for (i in 1:length(df[[1]])) df$start[i] <- df$end[i];
+ df <- as.data.frame(df)}, gcFirst=TRUE)
[1] 0.14 0.01 0.15 NA NA
I.e., keep it as a list until all manipulations are done, then coerce to
data frame.
Andy
> From: Liaw, Andy
>
> You are discovering part of the overhead of using a data
> frame. The way you
> specify the subset of data frame to replace matters somewhat:
>
> > st <- rep(1,1e4)
> > ed <- rep(2,1e4)
> > df <- data.frame(start=st, end=ed)
> > system.time(for (i in 1:dim(df)[1]) df[i,1] <- df[i,2],
> gcFirst=TRUE)
> [1] 35.96 0.10 36.37 NA NA
> > df <- data.frame(start=st, end=ed)
> > system.time(for (i in 1:dim(df)[1]) df[[1]][i] <- df[[2]][i],
> gcFirst=TRUE)
> [1] 22.63 0.17 22.88 NA NA
> > df <- data.frame(start=st, end=ed)
> > system.time(for (i in 1:dim(df)[1]) df$start[i] <- df$end[i],
> gcFirst=TRUE)
> [1] 19.29 0.13 19.46 NA NA
>
>
> If you have all numeric data, you might as well use a matrix
> instead of data
> frame:
>
> > m <- cbind(start=st, end=ed)
> > str(m)
> num [1:10000, 1:2] 2 2 2 2 2 2 2 2 2 2 ...
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : chr [1:2] "start" "end"
> > system.time(for (i in 1:nrow(df)) m[i,1] <- m[i,2], gcFirst=TRUE)
> [1] 0.06 0.00 0.08 NA NA
>
>
> Andy
>
>
> > From: Firas Swidan
> >
> > Hi,
> > I am experiencing a long delay when using dataframes inside
> > loops and was
> > wordering if this is a bug or not.
> > Example code:
> >
> > > st <- rep(1,100000)
> > > ed <- rep(2,100000)
> > > for(i in 1:length(st)) st[i] <- ed[i] # works fine
> > > df <- data.frame(start=st,end=ed)
> > > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever
> >
> > R: R 2.0.0 (2004-10-04)
> > OS: Linux, Fedora Core 2
> > kernel: 2.6.10-1.14_FC2
> > cpu: AMD Athlon XP 1600.
> > mem: 500MB.
> >
> > The example above is only to illustrate the problem. I need
> > loops to apply
> > some functions on pairs (not necessarily successive) of rows in a
> > dataframe.
> >
> > Thankful for any advices,
> > Firas.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachments,
> contains information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New Jersey, USA 08889), and/or its
> affiliates (which may be known outside the United States as
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for the use
> of the individual or entity named on this message. If you
> are not the intended recipient, and have received this
> message in error, please notify us immediately by reply
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>
More information about the R-help
mailing list