[R] Competing risk regression with CRR slow on large datasets?

Max Gordon dr.max.gordon at gmail.com
Wed Jul 20 22:18:53 CEST 2011


I posted this question on stats.stackexchange.com 3 days ago but the
answer didn't really address my question concerning the speed in
competing risk regression. I hope you don't mind me asking it in this

I’m doing a registry based study with almost 200 000 observations and
I want to perform a competing risk analysis. My problem is that the
crr() in the cmprsk package is exponentially increasing with
increasing number of observations. I therefore wrote a simulation for
trying different approaches; check how factors, data frames and
matrixes affect the performance so that I could choose the most
efficient combination.

I have a 1 year old computer with 8 Gb of RAM and still it didn’t
finish 70 000 observations when I left the computer overnight.

My main questions:

 -  Is there a faster way of performing competing risk analysis?
 -  Why does Win7 4 times perform better? Is it the 64-bit version
that improves the performance?
 -  Can I do something to speed things up?
 -  Is the simulation similarly slow on your computer? (see simulation
code at the end)

Thanks Max

To see the output and the simulation code see the original question at

More information about the R-help mailing list