[R] speeding up a loop

jim holtman jholtman at gmail.com
Fri Oct 18 16:14:19 CEST 2013


You might want to use the profiler (Rprof) on a subset of your code to
see where time is being spent.  Find a subet that runs for a minute,
or so, and enable profiling for the test.  Take a look and see which
functions are taking the time. This will be a start.  You can also
watch the task monitor while the application is running to see how
fast it is using the CPU and memory.  If you are going around a loop a
number of times, you can put some monitoring 'cat' statements that
will periodically print out the memory and CPU used.  So these are
some of the techniques to start looking at things in your program.
Also data.frames are very costly to 'index' into.  You might want to
consider converting to a matrix (where possible since all columns have
to have the same mode).  This can provide significant improvement.
This is something that you will be able to see when you use the
profiling tool since it will probably show a lot of time in the
functions that handle dataframes.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Oct 18, 2013 at 9:23 AM, Ye Lin <yelin at lbl.gov> wrote:
> Thanks for your help David!
>
> I was running the same code the other day and it worked fine although it
> took a while as well. You are right that dff shud be df1 and maybe it's a
> portion of my data so it have an error of length =0.
>
> About CPU usage, I got it by clicking ctrl+alt+delete and it showed CPU
> usage is really high. Is there anyway to figure out why R is taxing my
> system?
>
> Thanks!
>
> Ye
>
> On Thursday, October 17, 2013, David Winsemius wrote:
>
>>
>> On Oct 17, 2013, at 2:56 PM, Ye Lin wrote:
>>
>> > Hey R professionals,
>> >
>> > I have a large dataset and I want to run a loop on it basically creating
>> a
>> > new column which gathers information from another reference table.
>> >
>> > When I run the code, R just freezes and even does not response after
>> 30min
>> > which is really unusual. I tried sapply as well but does not improve at
>> > all.
>> >
>> > I am running R 3.0.2 on Windows 7.  I checked the system, when I run the
>> > code, my CPU usage is about 25%-30% that is taxing my desktop.
>>
>> A guess: It's not your CPU use ... it's your RAM use. You've probably
>> exhausted your RAM and your system has paged out to virutla memory
>> >
>> > Here is my code:
>> >
>> > #df1 is the data set I want to add a new column#
>> > #b is the reference tabel#
>> >
>> > for (i in (1:nrow(df1))) {
>> >  begin=which(b$Time2==df1$start[i] & b$Date==df1$Date[i])
>> >  date=unlist(strsplit(as.character(dff$end[i])," "))[1]
>> >   end=ifelse(date=="2013-10-17",
>> >   which(b$Time2==df1$end[i] & b$Date==df1$Date[i]),
>> >   which(b$Time2==df1$end[i]-3600*24 & b$Date==as.Date(df1$Date[i])+1))
>> >    df1$new[i] <- sum(b[begin:end,]$Power)
>> > }
>> >
>>
>> I get:
>> Error in strsplit(as.character(dff$end[i]), " ") : object 'dff' not found
>>
>> If I change the dff to df1, I get:
>> Error in begin:end : argument of length 0
>>
>> --
>> David.
>> > And here is a mimic sample of df1 & b:
>> >
>> > df1 <- structure(list(Date = structure(c(1369699200, 1369699200,
>> > 1369699200,
>> > 1369699200, 1369699200), tzone = "UTC", class = c("POSIXct",
>> > "POSIXt")), start = structure(c(1381991205, 1381990247, 1382010454,
>> > 1382007281, 1381992288), tzone = "UTC", class = c("POSIXct",
>> > "POSIXt")), end = structure(c(1381992405, 1381993727, 1382010694,
>> > 1382007461, 1381992468), tzone = "UTC", class = c("POSIXct",
>> > "POSIXt"))), .Names = c("Date", "start", "end"), row.names = c(NA,
>> > -5L), class = "data.frame")
>> >
>> >
>> > b <- structure(list(Date = structure(c(1369699200, 1369699200,
>> 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200), tzone =
>> "UTC",
>> > class = c("POSIXct",
>> > "POSIXt")), Time2 = structure(c(1381989634, 1381989694, 1381989754,
>> > 1381989814, 1381989874, 1381989934, 1381989994, 1381990054, 1381990114,
>> > 1381990174, 1381990234, 1381990294, 1381990354, 1381990414, 1381990474,
>> > 1381990534, 1381990594, 1381990654, 1381990714, 1381990774, 1381990834,
>> > 1381990894, 1381990954, 1381991014, 1381991074, 1381991134, 1381991194,
>> > 1381991254, 1381991314, 1381991374, 1381991434, 1381991494, 1381991554,
>> > 1381991614, 1381991674, 1381991734, 1381991794, 1381991854, 1381991914,
>> > 1381991974, 1381992034, 1381992094, 1381992154, 1381992214, 1381992274,
>> > 1381992334, 1381992394, 1381992454, 1381992514, 1381992574), tzone =
>> "UTC",
>> > class = c("POSIXct",
>> > "POSIXt")), Power = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
>> > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
>> > 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
>> > 45, 46, 47, 48, 49, 50)), .Names = c("Date", "Time2", "Power"
>> > ), row.names = c(NA, -50L), class = "data.frame")
>> >
>> > Thanks for your help!
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org <javascript:;> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list