[R-sig-Finance] [R-sig-finance] Fitting Distribution Mixs with Incomplete Data

p.kostov at Queens-Belfast.AC.UK p.kostov at Queens-Belfast.AC.UK
Fri May 19 10:34:30 CEST 2006


Why don't you try to use the mmlcr package, where you can specify cnorm 
(censored normal) with the appropriate min and max (known) censoring points.

Hope this helps
Philip

On May 18 2006, Lorenzo Isella wrote:

> Dear All,
> Suppose you are given some data to fit to a mixture of Gaussian
> distributions (a weighted sum of three distributions).
> You can do that using e.g. the mix package
> (http://www.math.mcmaster.ca/peter/mix/mix.html)
> which gave me very good results on some data I generated artificially.
> See e.g. the code:
> 
> # R code used to fit a sum of three normal distributions
> rm(list=ls())
> library(mixdist)
> set.seed(123)
> NN<-10000
> three<-3 #number of gaussian sequences I want to generate
> mysample<-matrix(nrow=NN,ncol=three)
> sdvec<-seq(1:three)
> sdvec[1]<-4
> sdvec[2]<-3
> sdvec[3]<-3.5
> meanvec<-seq(1:three)
> meanvec[1]<-20
> meanvec[2]<-50
> meanvec[3]<-70
> 
> for (i in 1:three)
> {
>  mysample[ ,i] <-rnorm(NN,meanvec[i],sdvec[i])
> }
> dim(mysample)<-c(3*NN,1)
> plot(density(mysample,kern="gaussian"),lwd=2,col=300)
> ## The commented lines are parts of the 1st version of this script
> 
> #x<-matrix(ncol=2,nrow=three*NN)
> #x[ ,1]<-mysample[]
> #x[ ,2]<-1
> y<-as.data.frame(mysample)
> mixgroup(y)
> z<-mixgroup(y,breaks=100)
> plot(z)
> #ini<-read.table("gaupar.TXT")
> ini<-mixparam(mu = c(24, 45, 73), sigma = c(5, 4, 4),pi=c(0.4,0.3,0.3))
> mymix<-mix(z,ini,"norm")
> summary(mymix)
> plot(mymix)
> print("So far so good")
> 
> 
> However, you can be in a situation in which you cannot sample the
> distribution that easily (e.g. you do not have stock quotations
> below/above a certain price) and then the same tool does not work that
> well. Consider for instance the same case as above but removing all
> the data below 15 and above 65:
> 
> # R code used to fit a sum of three normal distributions
> rm(list=ls())
> library(mixdist)
> set.seed(123)
> NN<-10000
> three<-3 #number of gaussian sequences I want to generate
> mysample<-matrix(nrow=NN,ncol=three)
> sdvec<-seq(1:three)
> sdvec[1]<-4
> sdvec[2]<-3
> sdvec[3]<-3.5
> meanvec<-seq(1:three)
> meanvec[1]<-20
> meanvec[2]<-50
> meanvec[3]<-70
> 
> for (i in 1:three)
> {
>  mysample[ ,i] <-rnorm(NN,meanvec[i],sdvec[i])
> }
> dim(mysample)<-c(3*NN,1)
> ##  now I am going to remove some data and see if I can still
> meaningfully optimize the
> ## distribution
> N3=3*NN
> mycount=0
> for (i in 1:N3)
> {if (mysample[i]>22 & mysample[i]<67)
> {
> mycount=mycount+1
> if (mycount ==1 )
> {
> mysample2<-cbind(mysample[i])
> }
> else
> {
> mysample2<-cbind(mysample2,mysample[i])
> }
> 
> }
> 
> }
> 
> plot(density(mysample,kern="gaussian"),lwd=2,col=300)
> ## The commented lines are parts of the 1st version of this script
> 
> #x<-matrix(ncol=2,nrow=three*NN)
> #x[ ,1]<-mysample[]
> #x[ ,2]<-1
> # y<-as.data.frame(mysample)
> mysample3<-mysample2[1:15000]
> y<-as.data.frame(mysample3)
> mixgroup(y)
> z<-mixgroup(y,breaks=100)
> plot(z)
> #ini<-read.table("gaupar.TXT")
> ini<-mixparam(mu = c(24, 45, 73), sigma = c(5, 4, 4),pi=c(0.4,0.3,0.3))
> mymix<-mix(z,ini,"norm")
> summary(mymix)
> plot(mymix)
> print("So far so good")
> 
> 
> Then the mixdist completely misses the means of the 2 external modes.
> I am sure that, apart from the combination of three Gaussians, this
> situation cannot be "new" at all.
> I am thinking about some iteration technique (e.g. you fit the data as
> well as you can with mixdist to start with, then according to your
> fitted parameters you generate the "missing"  results, then you re-fit
> the whole new set of data (real ones and those freshly generated) and
> so on hoping to get some convergent results).
> But it seems a bit cumbersome (furthermore, how many of them should I
> generate?) plus I am sure that solving this problem from scratch is
> re-inventing the wheel.
> Sorry for the long email.
> Kind Regards
> 
> Lorenzo Isella
> 
> _______________________________________________
> R-sig-finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>



More information about the R-SIG-Finance mailing list