[R] how to estimate distribution?

Mike Lawrence Mike.Lawrence at dal.ca
Mon Mar 9 14:08:32 CET 2009


I know it only causes a shift and scale transformation of the
distribution, but it always gets my goat when I see people using the
sum as a statistic of interest when the mean has a more general
application (i.e. some other researcher obtains a different number of
observations per subject) and a more meaningful interpretation (i.e.
the mean is the answer you'd expect, on average, from the participant
on any given question; the sum is ... ?)

An acceptable method of fitting empirical data to theoretical
distributions is maximum likelihood estimation. Sometimes there are
analytic solutions, sometimes you have to use a search algorithm like
optimize() (for 1 parameter distributions) or optim() (for multiple
parameter distributions) to find the solution.

Say you hypothesized that the red data came from an exponential
distribution, you would define a function that obtains the sum log
likelihood of the data given a specific rate parameter, then search a
reasonable range of rate parameters for that which maximuzes the sum
log liklihood:

exp.sll = function(rate,x){
	log.lik = dexp(x,rate,log=TRUE)
	sum.log.lik = sum(log.lik)
	print(c(rate,sum.log.lik))
	return(sum.log.lik)
}

a=rexp(150,rate=10)
hist(a,probability=TRUE)

best.rate = optimize(
	exp.sll
	,x=a
	,lower=0
	,upper=1/mean(a)*10
	,maximum=TRUE
)$maximum

curve(dexp(x,rate=best.rate),col='red',add=T)


On Wed, Mar 4, 2009 at 12:01 PM, Simone Gabbriellini
<simone.gabbriellini at gmail.com> wrote:
> Dear R-Experts,
>
> I have an empirical dataset with 150 subjects for 24 observations.
> In each observation, each subject can have a score in the range 0:3.
>
> I made then a simple index making the sum of the values in each row,
> so each subject have a score between 0 and 72.
>
> I was thinking about what kind of theoretical distribution such an
> index should have, so I try to make things random like:
>
> data<-array(0,c(150,24))
> data2<-array(0, c(150, 100))
> for (prova in 1:100){
>        for (i in 1:24){
>                for (q in 1:150){
>                        data[q,i]<-sample(0:3, 1)
>                        }
>                }
>        for (riga in 1:150){
>                data2[riga,prova]<-sum(data[riga,])
>                }
>        }
>
> now here you can find the plotted theoretical values (black) against
> the empirical ones (red):
>
> http://www.digitaldust.it/papers/indice.png
>
> How can I estimate the density of both distribution? because the red
> one looks like a pareto distribution, but I am not an expert...
>
> many thanks,
> Simone
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar

~ Certainty is folly... I think. ~




More information about the R-help mailing list