[R-SIG-Finance] Distribution fitting to loss data - Operational Risk

Christophe Dutang dutangc at gmail.com
Mon Jul 27 10:40:55 CEST 2015


Hi,

The Pareto II distribution does a great job for fitting your data. see below

plot(ecdf(mydat))
plot(ecdf(mydat), log="x", xlim=range(mydat)+1)

#identical values?
table(mydat)[table(mydat) > 10] 

mydat <- mydat + rnorm(mydat, sd=.1)

library(actuar)
library(fitdistrplus)

f1 <- fitdist(mydat, "burr", start=list(shape1=2, shape2=2, rate=0.0005), method="qme", lower=c(1, 1, 0.1), probs=c(1/4, 1/2, 3/4))
f2 <- fitdist(mydat, "pareto", start=list(shape=2, scale=1000), method="mle", lower=c(0.5, 0))

cdfcomp(list(f1, f2), do.points=FALSE, xlogscale=TRUE, legend=c("Burr", "Pareto 2"))

Regards, CD
---------------------------------------
Christophe Dutang
LMM, UdM, Le Mans, France
web: http://dutangc.free.fr

Le 22 juil. 2015 à 11:07, Amelia Marsh via R-SIG-Finance <r-sig-finance at r-project.org> a écrit :

> Hello!
> 
> I am into risk management and deal with Operatioanl risk. As a part of BASEL II guidelines, we need to arrive at the capital charge the banks must set aside to counter any operational risk, if it happens. As a part of Loss Distribution Approach (LDA), we need to collate past loss events and use these loss amounts. The usual process as being practised in the industry is as follows - 
> 
> Using these historical loss amounts and using the various statistical tests like KS test, AD test, PP plot, QQ plot etc, we try to identify best statistical (continuous) distribution fitting this historical loss data. Then using these estimated parameters w.r.t. the statistical distribution, we simulate say 1 miliion loss anounts and then taking appropriate percentile (say 99.9%), we arrive at the capital charge. 
> 
> However, many a times, loss data is such that fitting of distribution to loss data is not possible. May be loss data is multimodal or has significant variability, making the fitting of distribution impossible. Can someone guide me how to deal with such data and what can be done to simulate losses using this historical loss data in R. 
> 
> My data is as follows - 
> 
> mydat <- c(829.53,4000,6000,1000,1063904,102400,22000,4000,4200,2000,10000,400, 459006, 7276,4000,100,4000,10000,613803.36, 825,1000,5000,4000,3000,84500,200, 2000,68000,97400,6267.8, 49500,27000,2100,10489.92,2200,2000,2000,1000,1900, 6000,5600,100,4000,14300,100,94100,1200,7000,2000,3000,1100,6900,1000,18500,6000,2000,4000,8400,11200,1000,15100,23300,4000,13100,4500,200,2000,50000,3900,3200,2000,2000,67000,2000,500,2000,1000,1900,10400,1900,2000,3200,6500,10000,2900,1000,14300,1000,2700,1500,12000,40000,25000,2800,5000,15000,4000,1000,21000,15000,16000,54000,1500,19200,2000,2000,1000,39000,5000,1100,18000,10000,3500,1000,10000,5000,14000,1800,4000,1000,300,4000,1000,100,1000,4400,2000,2000,12000,200,100,1000,1000,2000,1600,2000,4000,14000,4000,13500,1000,200,200,1000,18000,23000,41400,60000,500,3000,21000,6900,14600,1900,4000,4500,1000,2000,2000,1000,4100,2000,1000,2000,8000,3000,1500,2000,2000,3500,2000,2000,1000,3800,30000,55000,500,1000,1000,2000,62400,2000,3000,200,200!
> 0,3500,2000,2000,500,3000,4500,1000,10000,2000,3000,3600,1000,2000,2000,5000,23000,2000,1900,2000,60000,2000,60000,20000,2000,2000,4600,1000,2000,1000,18000,6000,62000,68000,26800,50000,45900,16900,21500,2000,22700,2000,2000,32000,10000,5000,138000,159700,13000,2000,17619,2000,1000,4000,2000,1500,4000,20000,158900,74100,6000,24900,60000,500,1000,40000,10000,50000,800,4000,4900,6500,5000,400,500,3000,32300,24000,300,11500,2000,5000,1000,500,5000,5500,17450,56800,2000,1000,21400,22000,60000,3000,7500,3000,1000,1000,2000,1500,83700,2000,4000,170005,70000,6700,1500,3500,2000,10563.97,1500,25000,2000,2000,2267.57,1100,3100,2000,3500,10000,2000,6000,1500,200,20000,4000,46400,296900,150000,3700,7500,20000,48500,3500,12000,2500,4000,8500,1000,14500,1000,11000,2000,2000,120000,20000,7600,3000,2000,8000,1600,40000,2000,5000,34187.67,279100,9900,31300,814000,43500,5100,49500,4500,6262.38,100,10400,2400,1500,5000,2500,15000,40000,32500,41100,358600,109600,514300,258200,225900,402700,27!
> 4300,75000,1000,56000,10000,4100,1000,15000,100,40000,7900,5000,105000
> ,15100,2000,1100,2900,1500,600,500,1300,100,5000,5000,10000,10100,7000,40000,10500,5000,9500,1000,15200,2000,10000,10000,100,7800,3500,189900,58000,345000,151700,11000,6000,7000,15700,6000,3000,5000,10000,2000,1000,36000,1000,500,8000,9000,6000,2000,26500,6000,5000,97200,2000,5100,17000,2500,25500,24000,5400,90000,41500,6200,7500,5000,7000,41000,25000,1500,40000,5000,10000,21500,100,32000,32500,70000,500,66400,21000,5000,5000,12600,3000,6200,38900,10000,1000,60000,41100,1200,31300,2500,58000,4100,58000,42500) 
> 
> Sorry for the inconvenience. I do understand fitting of distribution to such data is not a full proof method, but this is what is the procedure that has been followed in the risk management risk industry. Please note that my question is not pertaining to operational risk. My question if if distributions are not fitting to a particular data, how do we proceed further to simualte data based on this data. 
> 
> Regards 
> 
> Amelia Marsh
> 
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



More information about the R-SIG-Finance mailing list