[R] how to estimate distribution?
Mike Lawrence
Mike.Lawrence at dal.ca
Mon Mar 9 14:08:32 CET 2009
I know it only causes a shift and scale transformation of the
distribution, but it always gets my goat when I see people using the
sum as a statistic of interest when the mean has a more general
application (i.e. some other researcher obtains a different number of
observations per subject) and a more meaningful interpretation (i.e.
the mean is the answer you'd expect, on average, from the participant
on any given question; the sum is ... ?)
An acceptable method of fitting empirical data to theoretical
distributions is maximum likelihood estimation. Sometimes there are
analytic solutions, sometimes you have to use a search algorithm like
optimize() (for 1 parameter distributions) or optim() (for multiple
parameter distributions) to find the solution.
Say you hypothesized that the red data came from an exponential
distribution, you would define a function that obtains the sum log
likelihood of the data given a specific rate parameter, then search a
reasonable range of rate parameters for that which maximuzes the sum
log liklihood:
exp.sll = function(rate,x){
log.lik = dexp(x,rate,log=TRUE)
sum.log.lik = sum(log.lik)
print(c(rate,sum.log.lik))
return(sum.log.lik)
}
a=rexp(150,rate=10)
hist(a,probability=TRUE)
best.rate = optimize(
exp.sll
,x=a
,lower=0
,upper=1/mean(a)*10
,maximum=TRUE
)$maximum
curve(dexp(x,rate=best.rate),col='red',add=T)
On Wed, Mar 4, 2009 at 12:01 PM, Simone Gabbriellini
<simone.gabbriellini at gmail.com> wrote:
> Dear R-Experts,
>
> I have an empirical dataset with 150 subjects for 24 observations.
> In each observation, each subject can have a score in the range 0:3.
>
> I made then a simple index making the sum of the values in each row,
> so each subject have a score between 0 and 72.
>
> I was thinking about what kind of theoretical distribution such an
> index should have, so I try to make things random like:
>
> data<-array(0,c(150,24))
> data2<-array(0, c(150, 100))
> for (prova in 1:100){
> for (i in 1:24){
> for (q in 1:150){
> data[q,i]<-sample(0:3, 1)
> }
> }
> for (riga in 1:150){
> data2[riga,prova]<-sum(data[riga,])
> }
> }
>
> now here you can find the plotted theoretical values (black) against
> the empirical ones (red):
>
> http://www.digitaldust.it/papers/indice.png
>
> How can I estimate the density of both distribution? because the red
> one looks like a pareto distribution, but I am not an expert...
>
> many thanks,
> Simone
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University
Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar
~ Certainty is folly... I think. ~
More information about the R-help
mailing list