[R] runtime on ising model

Thu Oct 28 18:58:40 CEST 2010

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
> Sent: Thursday, October 28, 2010 9:20 AM
> To: Michael D
> Cc: r-help at r-project.org
> Subject: Re: [R] runtime on ising model
> 
> 
> On Oct 28, 2010, at 11:52 AM, Michael D wrote:
> 
> > Mike, I'm not sure what you mean about removing foo but I 
> think the  
> > method
> > is sound in diagnosing a program issue and the results speak for  
> > themselves.
> >
> > I did invert my if statement at the suggestion of a CS professor  
> > (who also
> > suggested recoding in C, but I'm in an applied math program and  
> > haven't had
> > the time to take programming courses, which i know would be helpful)
> >
> > Anyway, with the statement as:
> >
> > if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
> > #do nothing
> > } else {
> > q <- q+1
> > Out[[q]] <- M
> > }
> >
> > run times were back to around 20 minutes.

Did that one change really make a difference?
R does not evaluate anything in the if or else
clauses of an if statement before evaluating
the condition.

> Have you tried replacing all of those 10^x operations with their  
> integer equivalents, c(10000L, 100000L, 1000000L)? Each time through  
> the loop you are unnecessarily calling the "^" function 4 times. You  
> could also omit the last one. 10^7,  during testing since M at the  
> last iteration (k=10^7) would be the final value and you could just  
> assign the state of M at the end. So we have eliminated 4*10^7  
> unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS  
> professor is perhaps used to having the C compiler do all 
> thinking of  
> this sort for him.)

%in% is a relatively expensive function.  Use == if you
can.  E.g., compare the following 2 ways of stashing
something at times 1e4, 1e5, and 1e6:

>  system.time({z <- integer()
                for(k in seq_len(1e6))
                   if(k %in% set) z[length(z)+1]<-k
                print(z)})
[1]   10000  100000 1000000
   user  system elapsed
 46.790   0.023  46.844

> system.time({z <- integer()
               nextCheckPoint <- 10^4
               for(k in seq_len(1e6))
                   if( k == nextCheckPoint ) {
                       nextCheckPoint <- nextCheckPoint * 10
                       z[length(z)+1]<-k
                   }
               print(z)})
[1]   10000  100000 1000000
   user  system elapsed
  4.529   0.013   4.545

With such a large number of iterations it pays to
remove unneeded function calls in arithmetic expressions.
R does not optimize them out - it is up to you to
do that.  E.g.,

  > system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
     user  system elapsed
    6.802   0.014   6.818
  > system.time(for(i in seq_len(1e6)) -sign(pi))
     user  system elapsed
    3.896   0.011   3.911

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> -- 
> David
> 
> > So as best I can tell something
> > happens in the if statement causing the computer to work 
> ahead, as the
> > professor suggests. I'm no expert on R (and have no desire to try  
> > looking at
> > the R source code (it would only confuse me)) but if anyone 
> can offer
> > guidance on how the if statement works (Does R try to work ahead?  
> > Under what
> > conditions does it try to "work ahead" so I can try to exploit this
> > behavior) I would greatly appreciate it.
> > If it would require too much knowledge of the computer system to  
> > understand
> > I doubt I would be able to make use of it, but maybe someone else  
> > could
> > benefit.
> >
> > On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka  
> > <marchywka at hotmail.com>wrote:
> >
> >> ----------------------------------------
> >>> Date: Tue, 26 Oct 2010 12:53:14 -0400
> >>> From: mike409 at gmail.com
> >>> To: jim at bitwrit.com.au
> >>> CC: r-help at r-project.org
> >>> Subject: Re: [R] runtime on ising model
> >>>
> >>> I have an update on where the issue is coming from.
> >>>
> >>> I commented out the code for "pos[k+1] <- M[i,j]" and the if  
> >>> statement
> >> for
> >>> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
> >> fast(er).
> >>> Next I added back in the "pos" statements and still 
> runtimes were  
> >>> good
> >>> (around 20 minutes).
> >>>
> >>> So I'm left with something is causing problems in:
> >>
> >> I haven't looked at this since some passing interest in magnetics
> >> decades ago, something about 8-tracks and cassettes, but you have
> >> to be careful with conclusions like " I removed foo and problem
> >> went away therefore problem was foo." Performance issues are often
> >> caused by memory, not CPU limitations. Removing anything with a big
> >> memory footprint could speed things up. IO can be a real 
> bottleneck.
> >> If you are talking about things on minute timescales, look at task
> >> manager and see if you are even CPU limited. Look for page faults
> >> or IO etc. If you really need performance and have a task which
> >> is relatively simple, don't ignore c++ as a way to generate data
> >> points and then import these into R for analysis.
> >>
> >> In short, just because you are focusing on math it doesn't mean
> >> the computer is limited by that.
> >>
> >>
> >>>
> >>> ## Store state at time 10^4, 10^5, 10^6, 10^7
> >>> if( k %in% c(10^4,10^5,10^6,10^7) ){
> >>> q <- q+1
> >>> Out[[q]] <- M
> >>> }
> >>>
> >>> Would there be any reason R is executing the statements 
> inside the  
> >>> "if"
> >>> before getting to the logical check?
> >>> Maybe R is written to hope for the best outcome (TRUE) 
> and will just
> >> throw
> >>> out its work if the logic comes up FALSE?
> >>> I guess I can always break the for loop up into four parts and  
> >>> store the
> >>> state at the end of each, but thats an unsatisfying 
> solution to me.
> >>>
> >>>
> >>> Jim, I like the suggestion of just pulling one big sample, but  
> >>> since I
> >> can
> >>> get the runtimes under 30 minutes just by removing the storage  
> >>> piece I
> >> doubt
> >>> I would see any noticeable changes by pulling large 
> sample vectors.
> >>>
> >>> Thanks,
> >>> Michael
> >>>
> >>> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:
> >>>
> >>>> On 10/26/2010 04:50 PM, Michael D wrote:
> >>>>
> >>>>> So I'm in a stochastic simulations class and I having issues  
> >>>>> with the
> >>>>> amount
> >>>>> of time it takes to run the Ising model.
> >>>>>
> >>>>> I usually don't like to attach the code I'm running, 
> since it will
> >>>>> probably
> >>>>> make me look like a fool, but I figure its the best way I can  
> >>>>> find any
> >>>>> bits
> >>>>> I can speed up run time.
> >>>>>
> >>>>> As for the goals of the exercise:
> >>>>> I need the state of the system at time=1, 10k, 100k, 
> 1mill, and  
> >>>>> 10mill
> >>>>> and the percentage of vertices with positive spin at all t
> >>>>>
> >>>>> Just to be clear, i'm not expecting anyone to tell me how to  
> >>>>> program
> >> this
> >>>>> model, cause I know what I have works for this 
> exercise, but it  
> >>>>> takes
> >> far
> >>>>> too long to run and I'd like to speed it up by replacing slow
> >> operations
> >>>>> wherever possible.
> >>>>>
> >>>>> Hi Michael,
> >>>> One bottleneck is probably the sampling. If it doesn't grab too  
> >>>> much
> >>>> memory, setting up a vector of the samples (maybe a 
> million at a  
> >>>> time
> >> if 10
> >>>> million is too big - might be able to rewrite your 
> sample vector  
> >>>> when
> >> you
> >>>> store the state) and using k (and an offset if you don't 
> have one  
> >>>> big
> >>>> vector) to index it will give you some speed.
> >>>>
> >>>> Jim
> >>>>
> >>>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > 	[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>