[R] Best practices for handling very small numbers?

Duncan Murdoch murdoch.duncan at gmail.com
Wed Oct 19 18:24:15 CEST 2011


On 19/10/2011 12:02 PM, Seref Arikan wrote:
> Thanks Duncan,
> Indeed I have found a problem in my likelihood, and I'll work on the log
> approach again. Regarding your comment; do you think it would make sense to
> tackle a data set of few millions for parameter estimation using MCMC? In my
> case we are talking about Griddy Gibbs sampling.
>
> One relevant idea is to break the observations into smaller chunks and use
> sequential updating by taking the posterior of one step as the prior of the
> other, however, I don't feel confident that this would work, since it is not
> the same situation with conjugate priors, where you have analytic form of
> the posterior.

I think you would do better to estimate the posterior directly.  It may 
be slow, but that just means that you need more computing power.

Duncan Murdoch

> Any pointers (papers, books) towards large scale data processing with MCMC,
> and Gibbs sampling in particular would be much appreciated.
>
> Kind regards
> Seref
>
>
> On Tue, Oct 18, 2011 at 11:12 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com>wrote:
>
> >  On 11-10-18 4:30 AM, Seref Arikan wrote:
> >
> >>  Hi Dan,
> >>  I've tried the log likelihood, but it reaches zero again, if I work with
> >>  say
> >>  1000 samples.
> >>  I need an approach that would scale to quite large sample sizes. Surely I
> >>  can't be the first one to encounter this problem, and I'm sure I'm missing
> >>  an option that is embarrassingly obvious.
> >>
> >
> >  I think you are calculating the log likelihood incorrectly.  Don't
> >  calculate the likelihood and take the log; work out the formula for the log
> >  of the likelihood, and calculate that.  (If the likelihood contains a sum of
> >  terms, as in a mixture model, this takes some thinking, but it is still
> >  worthwhile.)
> >
> >  With most models, it is just about impossible to cause the log likelihood
> >  to underflow if it is calculated carefully.
> >
> >  Duncan Murdoch
> >
> >
> >>  Regards
> >>  Seref
> >>
> >>  On Mon, Oct 17, 2011 at 6:15 PM, Nordlund, Dan (DSHS/RDA)<
> >>  NordlDJ at dshs.wa.gov>   wrote:
> >>
> >>   -----Original Message-----
> >>>>  From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >>>>  project.org] On Behalf Of Seref Arikan
> >>>>  Sent: Monday, October 17, 2011 9:11 AM
> >>>>  To: r-help at r-project.org
> >>>>  Subject: [R] Best practices for handling very small numbers?
> >>>>
> >>>>  Greetings
> >>>>  I have been experimenting with sampling from posterior distributions
> >>>>  using
> >>>>  R. Assume that I have the following observations from a normal
> >>>>  distribution,
> >>>>  with an unscaled joint likelihood function:
> >>>>
> >>>>  normsamples = rnorm(1000,8,3)
> >>>>
> >>>>  joint_likelihood = function(observations, mean, sigma){
> >>>>      return((sigma ^ (-1 * length(observations))) * exp(-0.5 * sum(
> >>>>  ((observations - mean ) ^ 2)) / (sigma ^ 2) ));
> >>>>  }
> >>>>
> >>>>  the joint likelihood omits the constant (1/(2Pi)^n), which is what I
> >>>>  want,
> >>>>  because I've been experimenting with some crude sampling methods. The
> >>>>  problem is, with the samples above, the joint likelihood becomes 0 very
> >>>>  quickly.
> >>>>  I wanted to experiment with tens of thousands of observations, but
> >>>>  without
> >>>>  an implementation of a transformation that can handle very small
> >>>>  values, my
> >>>>  sampling algorithms would not work.
> >>>>
> >>>>  This is an attempt to use some sampling algorithms for Bayesian
> >>>>  parameter
> >>>>  estimation. I do not want to resort to conjugacy, since I am developing
> >>>>  this
> >>>>  to handle non conjugate scenarios, I just wanted to test it on a
> >>>>  conjugate
> >>>>  scenario, but I've quickly realized that I'm in trouble due to
> >>>>  likelihood
> >>>>  reaching 0 quickly.
> >>>>
> >>>>  Your feedback would be appreciated. It makes me wonder how JAGS/BUGS
> >>>>  handles
> >>>>  this problem
> >>>>
> >>>>  Best regards
> >>>>  Seref
> >>>>
> >>>>
> >>>  Maybe you should work with the log-likelihood?
> >>>
> >>>
> >>>  Hope this is helpful,
> >>>
> >>>  Dan
> >>>
> >>>  Daniel J. Nordlund
> >>>  Washington State Department of Social and Health Services
> >>>  Planning, Performance, and Accountability
> >>>  Research and Data Analysis Division
> >>>  Olympia, WA 98504-5204
> >>>
> >>>  ______________________________**________________
> >>>  R-help at r-project.org mailing list
> >>>  https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>  PLEASE do read the posting guide
> >>>  http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
> >>>  and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>         [[alternative HTML version deleted]]
> >>
> >>
> >>  ______________________________**________________
> >>  R-help at r-project.org mailing list
> >>  https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> >>  PLEASE do read the posting guide http://www.R-project.org/**
> >>  posting-guide.html<http://www.R-project.org/posting-guide.html>
> >>  and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
>



More information about the R-help mailing list