[R] runtime on ising model

Thu Oct 28 20:34:37 CEST 2010

----------------------------------------
> Date: Thu, 28 Oct 2010 09:58:40 -0700
> From: wdunlap at tibco.com
> To: dwinsemius at comcast.net; mike409 at gmail.com
> CC: r-help at r-project.org
> Subject: Re: [R] runtime on ising model
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
> > Sent: Thursday, October 28, 2010 9:20 AM
> > To: Michael D
> > Cc: r-help at r-project.org
> > Subject: Re: [R] runtime on ising model
> >
> >
> > On Oct 28, 2010, at 11:52 AM, Michael D wrote:
> >
> > > Mike, I'm not sure what you mean about removing foo but I
> > think the
> > > method
> > > is sound in diagnosing a program issue and the results speak for
> > > themselves.

Agreed on first part but not second- empirical debugging rarely 
produces compelling results in isolation. As a collection
of symptons fine but not conclusive- if you learn c++ you will
find out about all kinds of things like memory corruption that
never make sense :) Here, the big concern is issues with memory
as you never determined to be CPU limited although based on
others comments you likely are in any case.

> > >
> > > I did invert my if statement at the suggestion of a CS professor
> > > (who also
> > > suggested recoding in C, but I'm in an applied math program and
> > > haven't had
> > > the time to take programming courses, which i know would be helpful)
> > >
> > > Anyway, with the statement as:
> > >
> > > if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
> > > #do nothing
> > > } else {
> > > q <- q+1
> > > Out[[q]] <- M
> > > }
> > >
> > > run times were back to around 20 minutes.
>
> Did that one change really make a difference?
> R does not evaluate anything in the if or else
> clauses of an if statement before evaluating
> the condition.

What is at issue here? That is, the OP claimed inverting polarity
sped things up, suggesting that the branch mattered. AFAIK he
never actually proved which branch was taken. This could
imply many things or nothing: one branch may be slow, or cause
a page fault, or the test may fail fast but succed slowly( testing
huge array for equality for example) .

>
> > Have you tried replacing all of those 10^x operations with their
> > integer equivalents, c(10000L, 100000L, 1000000L)? Each time through
> > the loop you are unnecessarily calling the "^" function 4 times. You
> > could also omit the last one. 10^7, during testing since M at the
> > last iteration (k=10^7) would be the final value and you could just
> > assign the state of M at the end. So we have eliminated 4*10^7
> > unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS
> > professor is perhaps used to having the C compiler do all
> > thinking of
> > this sort for him.)
>
> %in% is a relatively expensive function. Use == if you
> can. E.g., compare the following 2 ways of stashing
> something at times 1e4, 1e5, and 1e6:
>
> > system.time({z <- integer()
> for(k in seq_len(1e6))
> if(k %in% set) z[length(z)+1]<-k
> print(z)})
> [1] 10000 100000 1000000
> user system elapsed
> 46.790 0.023 46.844
>
> > system.time({z <- integer()
> nextCheckPoint <- 10^4
> for(k in seq_len(1e6))
> if( k == nextCheckPoint ) {
> nextCheckPoint <- nextCheckPoint * 10
> z[length(z)+1]<-k
> }
> print(z)})
> [1] 10000 100000 1000000
> user system elapsed
> 4.529 0.013 4.545
>
> With such a large number of iterations it pays to
> remove unneeded function calls in arithmetic expressions.
> R does not optimize them out - it is up to you to
> do that. E.g.,
>
> > system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
> user system elapsed
> 6.802 0.014 6.818
> > system.time(for(i in seq_len(1e6)) -sign(pi))
> user system elapsed
> 3.896 0.011 3.911
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> >
> > --
> > David
> >
> > > So as best I can tell something
> > > happens in the if statement causing the computer to work
> > ahead, as the
> > > professor suggests. I'm no expert on R (and have no desire to try
> > > looking at
> > > the R source code (it would only confuse me)) but if anyone
> > can offer
> > > guidance on how the if statement works (Does R try to work ahead?
> > > Under what
> > > conditions does it try to "work ahead" so I can try to exploit this
> > > behavior) I would greatly appreciate it.
> > > If it would require too much knowledge of the computer system to
> > > understand
> > > I doubt I would be able to make use of it, but maybe someone else
> > > could
> > > benefit.
> > >
> > > On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka
> > > wrote:
> > >
> > >> ----------------------------------------
> > >>> Date: Tue, 26 Oct 2010 12:53:14 -0400
> > >>> From: mike409 at gmail.com
> > >>> To: jim at bitwrit.com.au
> > >>> CC: r-help at r-project.org
> > >>> Subject: Re: [R] runtime on ising model
> > >>>
> > >>> I have an update on where the issue is coming from.
> > >>>
> > >>> I commented out the code for "pos[k+1] <- M[i,j]" and the if
> > >>> statement
> > >> for
> > >>> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
> > >> fast(er).
> > >>> Next I added back in the "pos" statements and still
> > runtimes were
> > >>> good
> > >>> (around 20 minutes).
> > >>>
> > >>> So I'm left with something is causing problems in:
> > >>
> > >> I haven't looked at this since some passing interest in magnetics
> > >> decades ago, something about 8-tracks and cassettes, but you have
> > >> to be careful with conclusions like " I removed foo and problem
> > >> went away therefore problem was foo." Performance issues are often
> > >> caused by memory, not CPU limitations. Removing anything with a big
> > >> memory footprint could speed things up. IO can be a real
> > bottleneck.
> > >> If you are talking about things on minute timescales, look at task
> > >> manager and see if you are even CPU limited. Look for page faults
> > >> or IO etc. If you really need performance and have a task which
> > >> is relatively simple, don't ignore c++ as a way to generate data
> > >> points and then import these into R for analysis.
> > >>
> > >> In short, just because you are focusing on math it doesn't mean
> > >> the computer is limited by that.
> > >>
> > >>
> > >>>
> > >>> ## Store state at time 10^4, 10^5, 10^6, 10^7
> > >>> if( k %in% c(10^4,10^5,10^6,10^7) ){
> > >>> q <- q+1
> > >>> Out[[q]] <- M
> > >>> }
> > >>>
> > >>> Would there be any reason R is executing the statements
> > inside the
> > >>> "if"
> > >>> before getting to the logical check?
> > >>> Maybe R is written to hope for the best outcome (TRUE)
> > and will just
> > >> throw
> > >>> out its work if the logic comes up FALSE?
> > >>> I guess I can always break the for loop up into four parts and
> > >>> store the
> > >>> state at the end of each, but thats an unsatisfying
> > solution to me.
> > >>>
> > >>>
> > >>> Jim, I like the suggestion of just pulling one big sample, but
> > >>> since I
> > >> can
> > >>> get the runtimes under 30 minutes just by removing the storage
> > >>> piece I
> > >> doubt
> > >>> I would see any noticeable changes by pulling large
> > sample vectors.
> > >>>
> > >>> Thanks,
> > >>> Michael
> > >>>
> > >>> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:
> > >>>
> > >>>> On 10/26/2010 04:50 PM, Michael D wrote:
> > >>>>
> > >>>>> So I'm in a stochastic simulations class and I having issues
> > >>>>> with the
> > >>>>> amount
> > >>>>> of time it takes to run the Ising model.
> > >>>>>
> > >>>>> I usually don't like to attach the code I'm running,
> > since it will
> > >>>>> probably
> > >>>>> make me look like a fool, but I figure its the best way I can
> > >>>>> find any
> > >>>>> bits
> > >>>>> I can speed up run time.
> > >>>>>
> > >>>>> As for the goals of the exercise:
> > >>>>> I need the state of the system at time=1, 10k, 100k,
> > 1mill, and
> > >>>>> 10mill
> > >>>>> and the percentage of vertices with positive spin at all t
> > >>>>>
> > >>>>> Just to be clear, i'm not expecting anyone to tell me how to
> > >>>>> program
> > >> this
> > >>>>> model, cause I know what I have works for this
> > exercise, but it
> > >>>>> takes
> > >> far
> > >>>>> too long to run and I'd like to speed it up by replacing slow
> > >> operations
> > >>>>> wherever possible.
> > >>>>>
> > >>>>> Hi Michael,
> > >>>> One bottleneck is probably the sampling. If it doesn't grab too
> > >>>> much
> > >>>> memory, setting up a vector of the samples (maybe a
> > million at a
> > >>>> time
> > >> if 10
> > >>>> million is too big - might be able to rewrite your
> > sample vector
> > >>>> when
> > >> you
> > >>>> store the state) and using k (and an offset if you don't
> > have one
> > >>>> big
> > >>>> vector) to index it will give you some speed.
> > >>>>
> > >>>> Jim
> > >>>>
> > >>>>
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.