Dear friends,
Over the past weeks, I have been asking a lot of questions about how to
use R in Bayesian analysis. I am brand new to R, but I am very pleased with
it. I started with winbugs but I found winbugs to be a limited software, not
bad but has several limitations. By contrast, R allows the analyst to tackle
any problem with a huge set of tools for any kind of analysis. I love R. In
addition, I wanted to thank all the people that without hesitation have lent
a hand. I have received prompt answers from many people on many questions.
At this moment, I feel that I am a novice in R but hopefully I will become
an expert and be able to help other users the same way I have been helped by
you.
I wanted to give back to the R-help list the log of all communications I
have received regarding the questions I had. I think that this log can help
next time somebody who may have some questions about Bayesian analysis and,
particularly stochastic cost frontier analysis. In addition, I offer to
share the little I know about it with anybody interested in it.
Jorge
-----Original Message-----
From: Icabalceta, Jorge L.
Sent: Thursday, February 05, 2004 1:53 PM
To: 'r-help@lists.R-project.org'
Subject: rgamma question
I was trying to generate random numbers with a gamma distribution. In R the
function is:
rgamma(n, shape, rate = 1, scale = 1/rate). My question is that if
X~gamma(alpha, beta) and I want to generate one random number where do I
plug alpha and
beta in rgamma? and, what is the meaning and use of rate?
Thanks for your attention,
Jorge
Did you look at the help? From ?rgamma:
Details:
If 'scale' is omitted, it assumes the default value of '1'.
The Gamma distribution with parameters 'shape' = a and 'scale' = s
has density
f(x)= 1/(s^a Gamma(a)) x^(a-1) e^-(x/s)
for x > 0, a > 0 and s > 0. The mean and variance are E(X) = a*s
and Var(X) = a*s^2.

Then, depending how you define "alpha" and "beta" use the above to
figure out how to use rgamma.
-sundar
-----Original Message-----
From: Peter Dalgaard [mailto:p.dalgaard@biostat.ku.dk]
Sent: Thursday, February 05, 2004 2:54 PM
To: Icabalceta, Jorge L.
Cc: 'r-help@lists.R-project.org'
Subject: Re: [R] rgamma question
"Icabalceta, Jorge L." < Icabalceta_j@wlf.state.la.us
> writes:
> I was trying to generate random numbers with a gamma distribution. In R
the
> function is:
> rgamma(n, shape, rate = 1, scale = 1/rate). My question is that if
> X~gamma(alpha, beta) and I want to generate one random number where do I
> plug alpha and beta in rgamma? and, what is the meaning and use of rate?
Well, it depends on your definition of alpha and beta.... You need to
match up your notation for the gamma density with that given on
help(rgamma), which will also tell you what to do with them.
The "rate" argument just allows you to specify the scale as its
inverse. A large rate corresponds to a narrow distribution. I suspect
this is popular notation for interarrival distributions in queuing
theory.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - ( p.dalgaard@biostat.ku.dk )
FAX: (+45) 35327907
-----Original Message-----
From: Thomas Lumley [mailto:tlumley@u.washington.edu]
Sent: Thursday, February 05, 2004 3:26 PM
To: Icabalceta, Jorge L.
Cc: 'r-help@lists.R-project.org'
Subject: Re: [R] rgamma question
On Thu, 5 Feb 2004, Icabalceta, Jorge L. wrote:
It depends on what you mean by gamma(alpha,beta). It could be
rgamma(1,alpha,beta)
or
rgamma(1,alpha,1/beta)
since both of these parameterisations are used.
If you think the mean of gamma(alpha,beta) is alpha*beta, use the second
one, if you think it is alpha/beta use the first one.
-thomas
-----Original Message-----
From: Rolf Turner [mailto:rolf@math.unb.ca]
Sent: Friday, February 06, 2004 1:40 PM
To: Icabalceta_j@wlf.state.la.us
Subject: RE: [R] rgamma question
> Sorry to bother you. I am sort of confused with the random number
> generation from a gamma distribution. For some reason, if a variable
> X~gamma(shape, scale)I have been told that the mean mu can be eithe
> mu=shape/scale or mu=shape*scale. Why is that so? This is making me
> have doubt about the random generation I am using:
For crying out loud READ the messages you have already received.
These have made it abundantly clear: It all DEPENDS on how you
***parameterize*** the gamma distribution. If you can't understand
this simple fact then you shouldn't be doing this stuff.
The help for the gamma distribution also makes this parameterization
issue quite clear. READ THE HELP!
> 1.- The first variable I have is the precision (h) (the inverse of
> the variance)distributed as:
>
> h~gamma((T-2)/2, SSE/2) where T is the number of observations and SSE
> is the sum of square errors. How do I draw random number in R with
> this information. How do I plug this into rgamma(n,shape,scale)?
As you have been told ***repeatedly*** it depends on how the
the parameters of the gamma distribution are to be interpreted.
If you have ``h ~ gamma(a,b)'' does this mean that
E(h) = a*b
or
E(h) = a/b ?
(These correspond to the two common parameterizations of the
gamma family.)
If the first applies then a = shape and b = scale.
If the second applies then a = shape and b = rate.
So R allows for both possibilities.
> 2.- The second variable is L^-1 ~ gamma(T+1, vi-ln(r*)^-1) (please,
> don't mind my no explanation for the terms used in the shape and
> scale parameters). Again, How do I plug this into
> rgamma(n,shape,scale)?
Same answer.
> 3.- I am having a problem putting the results of each iteration from
> for(i 1:11000) into samp. For some reason I get 10000 identical
> values for each column. What am I doing wrong? To see what the
> problem is you can try to run the program. Attached is the data I am
> using, the program with comments and without them.
I haven't bothered to read through the incredibly long-winded load of
rubbish that you enclosed. If you are trying to get an 11000 x 10000
matrix, each of whose rows is a random sample of size 10000
(constructed in some arcane --- probably unnecessarily arcane ---
fashion) then you might do something like:
samp <- matrix(0,11000,1000)
for(i in 1:11000) {
.
.
(construct a random sample of size 10000,
called ``xxx'')
.
.
samp[i,] <- xxx
}
There is probably a much more efficient way of accomplishing this;
using a for loop is likely to be unnecessary and slow. However
it will (eventually) work.
cheers,
Rolf Turner
rolf@math.unb.ca
-----Original Message-----
From: Thomas Lumley [mailto:tlumley@u.washington.edu]
Sent: Friday, February 06, 2004 2:09 PM
To: Icabalceta, Jorge L.
Cc: 'r-help@lists.R-project.org'
Subject: RE: [R] rgamma question
On Fri, 6 Feb 2004, Icabalceta, Jorge L. wrote:
> Sorry to bother you. I am sort of confused with the random number
generation
> from a gamma distribution. For some reason, if a variable X~gamma(shape,
> scale)I have been told that the mean mu can be eithe mu=shape/scale or
> mu=shape*scale. Why is that so?
It isn't. The mean is always shape*scale. However, the notation
Gamma(alpha,beta) is ambiguous. Some people use beta=scale (in which
case mean=alpha*beta) and others use beta=1/scale (in which case
mean=alpha/beta). A few people even use use alpha=mean, beta=variance,
but they usually know who they are.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley@u.washington.edu University of
Washington, Seattle
-----Original Message-----
From: Rolf Turner [mailto:rolf@math.unb.ca]
Sent: Friday, February 06, 2004 2:15 PM
To: Icabalceta_j@wlf.state.la.us
Subject: RE: [R] rgamma question
One more thing --- I don't know, but I ***suspect*** that the first
parameterization is what you want. I.e. the one in which the mean is
a*b. This is (by a considerable margin) the ***more*** common of the
two common parameterizations of the gamma distribution.
cheers,
Rolf Turner
-----Original Message-----
From: Duncan Murdoch [mailto:dmurdoch@pair.com]
Sent: Thursday, February 12, 2004 11:28 AM
To: Icabalceta, Jorge L.
Cc: 'r-help@lists.R-project.org'
Subject: Re: [R] How do you create a "MCMC" object?
On Thu, 12 Feb 2004 10:06:41 -0600, "Icabalceta, Jorge L."
< Icabalceta_j@wlf.state.la.us > wrote
:
>I have been running a Gibbs Sampler to estimate levels of efficiency in the
>Louisiana Shrimp Industry. I created a matrix (samp) where I stored the
>results of each iteration for 86 variables. I run 10,000 iterations. So,
the
>matrix samp is 10,000 x 86. I want to use the gelman-rubin test to check
for
>convergence. To do that, I need at least two chains. If I run second chain
>with different starting values and seed, I could save to the matrix
'samp2'.
>So, I will have two matrices 10,000x86. I want to use the function
>gelman.diag(x, confidence = 0.95, transform=FALSE), where x: An 'mcmc.list'
>object with more than one chain, and with starting values that are
>overdispersed with respect to the posterior distribution. How do I create
>mcmc object from these matrices? I need to create an mcmc object with the
>two chains I have stored in 'samp' and 'samp2'.
>Thanks for your help and attention.
I think you're asking about functions from the coda package.
The normal way to create an MCMC object is to read BUGS output, but
it's probably not hard to create one from your own data. There's a
function called "as.mcmc" which looks like it should do what you need;
if not, you might want to write to the coda maintainer, Martyn Plummer
< plummer@iarc.fr >.
Duncan Murdoch
-----Original Message-----
From: Icabalceta, Jorge L.
To: 'r-help@lists.R-project.org'
Sent: 2/16/2004 5:05 PM
Subject: [R] How do we obtain Posterior Predictive (Bayesian) P-values in R
(a sking a second time)
Dear Friends,
According to Gelman et al (2003), "...Bayesian P-values are defined
as
the probability that the replicated data could be more extreme than the
observed data, as measured by the test quantity p=pr[T(y_rep,tetha) >=
T(y,tetha)|y]..." where p=Bayesian P-value, T=test statistics,
y_rep=data
from replicated experiment, y=data from original experiment, tetha=the
function of interest. My question is, How do I calculate p (the bayesian
P-value) in R from the chain I obtained from the Gibbs sampler? I have a
matrix 'samp' [10,000x86] where I stored the result of each of the
10,000
iterations of the 86 variables of interest.
Something I want to add is that Gelman also states that "...in
practice,
we usually compute the posterior predictive distribution using
simulation.
If we already have L simulations from the posterior density of theta, we
just draw one y_rep from the predictive distribution for each simulated
theta; we now have L draws from the joint posterior distribution,
p(y_rep,theta|y). The posterior predictive check is the compariosn
between
the realized test quantities, T(y, theta_L) ant the predictive test
quantities, T(y_rep, theta_L). The estimated P-value is just the
proportion
of these L simulations for which the test quantity equals or exceeds its
realized value; that is, for which T(y_rep,theta_L)>=T(y,theta_L)..."
Does this means that the usual p-value applies? i.e., pv <- 1 -
chosen_CDF(parameters)?
Can anybody clarify that for me please?
Thanks for your help.
Jorge
-----Original Message-----
From: Davis, Sean (NIH/NHGRI) [mailto:sdavis2@mail.nih.gov]
Sent: Monday, February 16, 2004 2:42 PM
To: 'Icabalceta, Jorge L. '
Subject: RE: [R] How do we obtain Posterior Predictive (Bayesian)
P-values in R (a sking a second time)
If you have a matrix as you describe with rows corresponding to replicates
and columns corresponding to variables, then you can get the p-value as:
m <- rbind(orig,sim)
p <- apply(m,2,function(x) {(sum(x[2:length(x)]>x[1])/(length(x)-1))})
where orig is the vector of "observed values" for the 86 variables and sim
is the matrix of 10000 x 86. This will count the number of times that the
simulated replicate exceeded the observed value and then divide by (in your
specific case) 10000 to get the proportion of replicates in which the value
was greater than the observed value (the p-value).
I didn't test this (not in front of a computer with R installed), so try it
and see if you get what you want (should be a vector of length 86 with
values between 0 and 1.) Of course, you may need to alter the function to
compute the correct p-value; as given, it is a one-sided test.
Is this what you had in mind, or am I misinterpreting your email?
Sean
To: "Icabalceta, Jorge L." < Icabalceta_j@wlf.state.la.us
>
Subject: Re: [R] How do we obtain Posterior Predictive (Bayesian) P-values
in R (a sking a second time)
References: <
FF01C406D3A336489C58B9D0AE8E8E3702A0B0C2@wlfnt1.wlf.state.la.us
>
Content-Type: text/plain; charset=IBM850; format=flowed
Content-Transfer-Encoding: 7bit
Icabalceta, Jorge L. wrote:
> Dear Friends,
> According to Gelman et al (2003), "...Bayesian P-values are defined as
> the probability that the replicated data could be more extreme than the
> observed data, as measured by the test quantity p=pr[T(y_rep,tetha) >=
> T(y,tetha)|y]..." where p=Bayesian P-value, T=test statistics, y_rep=data
> from replicated experiment, y=data from original experiment, tetha=the
> function of interest. My question is, How do I calculate p (the bayesian
> P-value) in R from the chain I obtained from the Gibbs sampler? I have a
> matrix 'samp' [10,000x86] where I stored the result of each of the 10,000
> iterations of the 86 variables of interest.
> Something I want to add is that Gelman also states that "...in
practice,
> we usually compute the posterior predictive distribution using simulation.
> If we already have L simulations from the posterior density of theta, we
> just draw one y_rep from the predictive distribution for each simulated
> theta; we now have L draws from the joint posterior distribution,
> p(y_rep,theta|y). The posterior predictive check is the compariosn between
> the realized test quantities, T(y, theta_L) ant the predictive test
> quantities, T(y_rep, theta_L). The estimated P-value is just the
proportion
> of these L simulations for which the test quantity equals or exceeds its
> realized value; that is, for which T(y_rep,theta_L)>=T(y,theta_L)..."
> Does this means that the usual p-value applies? i.e., pv <- 1 -
> chosen_CDF(parameters)?
Or "pv <- chosen_CDF(parameters)" depending on which tail you're testing.
> Can anybody clarify that for me please?
Yes, it really is this simple, assuming your sampler is working well.
If you have MCMC draws for theta in a vector Post, and you want to find
the probability that theta > A, then you can use:
PV <- length(Post>A)/length(Post)
You can create confidence limits in a similar way, using quantiles.
I'll leave that as an exercise for the reader.
A couple of words of warning:
1. MCMC samplers won't visit the extremes of the distribution very
often, so the approximation gets worse the further out into the tail you
go. Smoothing the density might help, as long as you're not too far
into the tail.
2. A lot of Bayesians (and others!) frown on significance testing. At
worst, they are useless (because the null hypothesis is silly), at next
worse you're just testing whether you have enough data to distinguish an
effect from zero (because a priori the null hypothesis is merely
unlikely to be true).
Bob
--
Bob O'Hara
Dept. of Mathematics and Statistics
P.O. Box 4 (Yliopistonkatu 5)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 23743
Mobile: +358 50 599 0540
Fax: +358-9-191 22 779
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org
[[alternative HTML version deleted]]