[R] Please help me interpret these results (fitting distributions to real data)

Thu Sep 25 23:35:10 CEST 2008

I just thought of a useful metaphore for the problem I face.  I am dealing
with a problem in business finance, with two kinds of related events. 
However, imagine you have a known amount of carbon (so many kilograms), but
you do not know what fraction is C14 (and thus radioactive).  Only the C14
will give decay events (and once that event has occurred, the atom that
decayed will never decay again).  C12 will never decay.  What you want to
know is a) what is the ratio of C12 to C14 at time 0, and b) how many decay
events will happen between time X and time y, or how many decay events will
happen after time z.  That integral, is, IIRC, quite simple.

The data you get from your equipment will be a number of decay events in
time period n (could be a specific week or a specific day).  How would you
get this data into R so that you can use, say, fitdistr(MASS) to estimate
the decay rate, and then proceed to answer the questions of interest?

Anyway, in my early tests (before I figured out which distribution is most
appropriate in this case), I got the following results (this is for one
week's data, but other weeks' result are similar).

==========curious results=================
> ex15 = fitdistr(x15,"exponential")
> str(ex15)
List of 4
 $ estimate: Named num 0.0653
  ..- attr(*, "names")= chr "rate"
 $ sd      : Named num 0.00356
  ..- attr(*, "names")= chr "rate"
 $ n       : int 337
 $ loglik  : num -1256
 - attr(*, "class")= chr "fitdistr"
> ge15 = fitdistr(x15,"geometric")
> str(ge15)
List of 4
 $ estimate: Named num 0.0613
  ..- attr(*, "names")= chr "prob"
 $ sd      : Named num 0.00324
  ..- attr(*, "names")= chr "prob"
 $ n       : int 337
 $ loglik  : num -1257
 - attr(*, "class")= chr "fitdistr"
> po15 = fitdistr(x15,"poisson")
> str(po15)
List of 4
 $ estimate: Named num 15.3
  ..- attr(*, "names")= chr "lambda"
 $ sd      : Named num 0.213
  ..- attr(*, "names")= chr "lambda"
 $ n       : int 337
 $ loglik  : num -2721
 - attr(*, "class")= chr "fitdistr"
> nb15 = fitdistr(x15,"negative binomial")
Warning messages:
1: In dnbinom(x, size, prob, log) : NaNs produced
2: In dnbinom(x, size, prob, log) : NaNs produced
3: In dnbinom(x, size, prob, log) : NaNs produced
> str(nb15)
List of 4
 $ estimate: Named num [1:2]  0.973 15.309
  ..- attr(*, "names")= chr [1:2] "size" "mu"
 $ sd      : Named num [1:2] 0.0786 0.8719
  ..- attr(*, "names")= chr [1:2] "size" "mu"
 $ loglik  : num -1267
 $ n       : int 337
 - attr(*, "class")= chr "fitdistr"
> AIC(ex15)
[1] 2514.952
> AIC(ge15)
[1] 2516.273
> AIC(po15)
[1] 5444.62
> AIC(nb15)
[1] 2538.385
>
=========end curious results=================================

Notice that the AIC for the exponential and geometric distributions are
almost idential, and that for the negative binomial is not much different.

This now makes some sense; the geometric being a discrete equivalent of the
exponential, as well as being a special case of the negative binomial. 
Right?  With such relationships among them, it would not be surprising to
see them give similar values of AIV.  Right?

Thanks

Ted
-- 
View this message in context: http://www.nabble.com/Please-help-me-interpret-these-results-%28fitting-distributions-to-real-data%29-tp19678782p19678782.html
Sent from the R help mailing list archive at Nabble.com.