[R] practical help ... solving a system...

Mon Jun 20 18:02:50 CEST 2005

	  The problem is that "size" in "dbinom" must be an integer:

 > dbinom(x=2, size=4.5, prob=.5)
[1] NaN
Warning message:
NaNs produced in: dbinom(x, size, prob, log)

	  Consider the following:
 > library(MASS)
 > set.seed(123)
 > x <- rbinom(100, 10, 0.4)
 > nx <- max(x)
 > fit <- fitdistr(x, dbinom, start=list(prob=0.5),
+ size = nx)
Warning message:
one-diml optimization by Nelder-Mead is unreliable: use optimize in: 
optim(start, mylogfn, x = x, hessian = TRUE, ...)
 > attributes(fit)
$names
[1] "estimate" "sd"

$class
[1] "fitdistr"
#### From this, we learn that fit$est is the MLE for prob given size
#### Try different a range of values for "size"
 > lglk <- rep(NA, 12)
#### Start with nx, since it can't be less than that.
 > names(lglk) <- nx:(nx+11)
 > for(i in 1:12){
+ fit <- fitdistr(x, dbinom, start=list(prob=0.5),
+ size = nx+i-1)
+ lglk[i] <- sum(dbinom(x, size=nx+i-1,
+ prob=fit$estimate, log=TRUE))
+ }
There were 15 warnings (use warnings() to see them)
 > qchisq(0.95, 1)
[1] 3.841459
 > 2*(max(lglk)-lglk)
           8           9          10          11          12          13
1.314439853 0.000000000 0.004998298 0.434260975 1.000371029 1.594030787
          14          15          16          17          18          19
2.170873826 2.713847684 3.217152046 3.680715184 4.106493564 4.497508846

# 2*log(likelihood ratio) is approximately chi-square,
# so a 95% confidence interval for "size" is 8:17.

	  If I needed this for a scientific research paper, I'd do more 
research on how to estimate "size" for the binomial -- if I really 
thought it should be estimated.  If I needed an answer today, I'd 
probably go with what I just did here.

	  hope this helps,
	  spencer graves

##################
Carsten Steinhoff wrote:
 > Hello Spencer,
 >
 > thank you for your reply. As I've seen you have optimized only one 
value of
 > the binomial distribution by using Fitdistr.
 > My problem is that I have to find both size AND prob.
 > The error message always is: error in using optim. non-finite finite
 > difference value [1]
 >
 > So I thought to do a MLE "by hand" ...
 >
 > Do you have an idea what do modify on the error above?
 >
 > Greetings, Carsten

Spencer Graves wrote:
>       "PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html".  If you had provided more 
> information about what you tried, it might be easier for someone else to 
> offer effective help.  In particular, what did you try with fitdistr, 
> and why do you think it didn't work (e.g., what error message did you get)?
> 
>       The following is a minor modification of examples in the 
> "fitdistr" help page:
> 
>  > library(MASS)
>  > set.seed(123)
>  > x <- rbinom(100, 10, 0.4)
>  > (fit <- fitdistr(x, dbinom, start=list(prob=0.5), size=10))
>       prob
>   0.39804687
>  (0.01547943)
> Warning message:
> one-diml optimization by Nelder-Mead is unreliable: use optimize in: 
> optim(start, mylogfn, x = x, hessian = TRUE, ...)
>  >
>       The difference between prob=0.4 that I simulated and the 
> 0.39804687 estimated is shockingly small.  (No, I didn't run 100 
> simulations with different seeds and pick the best one.)
> 
>       If I were concerned about the warning, I could list "fitdist" and 
> read the code.  I might even copy it into a script file and walk through 
> it line by line.  When checked the code for "fitdistr" just now, I 
> learned that it calls "optim", and "optim" provides other estimation 
> methods, e.g., BFGR and CG.  The following are the results from trying 
> those two:
> 
> (fit.BFGS <- fitdistr(x, dbinom, start=list(prob=0.5),
> + size=10, method="BFGS"))
>       prob
>   0.39800028
>  (0.01547882)
> Warning messages:
> 1: NaNs produced in: dbinom(x, size, prob, log)
> 2: NaNs produced in: dbinom(x, size, prob, log)
> 3: NaNs produced in: dbinom(x, size, prob, log)
> 4: NaNs produced in: dbinom(x, size, prob, log)
> 5: NaNs produced in: dbinom(x, size, prob, log)
>  > (fit.CG <- fitdistr(x, dbinom, start=list(prob=0.5),
> + size=10, method="CG"))
>       prob
>   0.39800001
>  (0.01547881)
> Warning messages:
> 1: NaNs produced in: dbinom(x, size, prob, log)
> 2: NaNs produced in: dbinom(x, size, prob, log)
> 3: NaNs produced in: dbinom(x, size, prob, log)
> 4: NaNs produced in: dbinom(x, size, prob, log)
> 5: NaNs produced in: dbinom(x, size, prob, log)
> 
>       If I'm concerned about these warnings, I can replace "dbinom" by a 
> local copy that prints the value of "prob" each time it's called:
> 
> dbinom. <- function (x, size, prob, log = FALSE){
>     cat(prob, "")
> .Internal(dbinom(x, size, prob, log))
> }
> (fit.BFGS <- fitdistr(x, dbinom., start=list(prob=0.5),
>     size=10, method="BFGS"))
> 0.5 0.501 0.499 -407.5005 -81.10011 -15.82002 -2.764004 -0.1528009 
> 0.3694398 0.3704398 0.3684398 0.3996073 0.4006073 0.3986073 0.3980445 
> 0.3990445 0.3970445 0.2133659 0.3611088 0.3906574 0.3965671 0.3977490 
> 0.3979854 0.3989854 0.3969854 0.3980003 0.45997 0.4103942 0.4004791 
> 0.3984960 0.3980994 0.3980201 0.3980043 0.3980011 0.3980004 0.3980003 
> 0.3980003 0.3980003 0.3980003 0.3980003 0.4000003 0.3980003 0.3980003 
> 0.3960003       prob
>   0.39800028
>  (0.01547882)
> Warning messages:
> 1: NaNs produced in: dbinom(x, size, prob, log)
> 2: NaNs produced in: dbinom(x, size, prob, log)
> 3: NaNs produced in: dbinom(x, size, prob, log)
> 4: NaNs produced in: dbinom(x, size, prob, log)
> 5: NaNs produced in: dbinom(x, size, prob, log)
>  >
>       It's no wonder dbinom produced NaNs:  It does that when prob < 0.
> 
>       hope this helps.
>       spencer graves
> 
> p.s.  MASS = "Modern Applied Statistics with S" by Venables and Ripley.. 
>  Have you seen this book?  I've learned a lot from it.
> 
> Carsten Steinhoff wrote:
> 
>> Hello,
>>  
>> I want to estimate the parameters of a binomial distributed rv using MLE.
>> Other distributions will follow.
>> The equation system to solve is not very complex, but I've never done 
>> such
>> work in R and don't have any idea how to start...
>>  
>> The system is:
>>  
>> (1)     n*P = X
>>  
>> (2)    [sum {from j=0 to J-1} Y{j} /(n-j)] = -n * ln (1-X / n)
>>  
>>  
>> where    * only X is given (empirical mean)
>>              * J is maximum observed
>>              * Y is the number of observations above j
>>              * n and P are the parameters of the binomial distribution to
>> find....
>>  
>> Who could help me with an example-solution for my "first" distribution 
>> ? I
>> also need a hint how to make the sum-element.
>>  
>> Maybe there's another - more simple - way to estimate the parameters...
>> first I tried via "FITDISTR" but without success.
>>  
>> Thanx a lot for your help.
>>  
>> Carsten
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
> 
>