[R-sig-ME] Question about truncated poisson vs. normal poisson, "minus one" & Co. in winbugs or glmer

Tue Oct 16 11:28:36 CEST 2012

------------------------------

Message: 5
Date: Tue, 16 Oct 2012 10:56:32 +0200
From: Ulf K?ther <ukoether at uke.de>
To: <r-sig-mixed-models at r-project.org>
Subject: [R-sig-ME] Fwd: Question about truncated poisson vs. normal
	poisson "minus one" & Co. in winbugs or glmer
Message-ID: <507D2140.6000004 at uke.de>
Content-Type: text/plain

Yesterday I sent this email via the wrong email-adress, so it was not
confirmed. Sorry for the possible double posting!

Original message:

Hello altogether!

I just have one major question and one minor one, which I coudn't solve
for myself yet and I have to admit that both of these are really
becoming really a pain in the neck. I am not a statistician, so I do not
come from a backround where such "normal" problems do get discussed a
lot, accordingly I am a little bit stuck here. Any help would be really
appreciated. And sorry in advance for the long text in the following... :-(

1.) (Major Q.) I want to fit a (bayesian) hurdle poisson model (glmm
with no false zeros possible in this outcome variable) making use of
winbugs or openbugs via two separate models (one binomial, one truncated
poisson),

AFZ: Why not a ZIP? The hurdle model is based on the following question:
What drives presence-absence....and once present..what drives the numbers.
It is a more limited question as the one underlying a ZIP.

but as I can see until now the implementation via openbugs
(which allows truncation, winbugs not - I think?!?) with a truncated
pois-distribution (as coded as RESPONSES[i] ~ dpois(mu[i])T(1,36) )
brings the simluation time to almost something where I feel I am faster
by hand (5134 sec. for 10000 iterations burn in).

AFZ: Try implementation via the zero trick, and see whether it is any better.

  I think that I may do some mistakes regarding the uninformative priors
(maybe to wide?) or the starting values (maybe to far away?) or is the
"T(x,x)"-Coding wrong?

AFZ: That could be a cause. Also crucial is to center or normalize the continuous covariates.

Implementation via winbugs and openbugs just as a
poisson glmm (ignoring the truncation at one) is going fast and reveals
nearly the same estimates as in glmer for a poisson glmm (winbugs-coding
here is nearly the same as in chapter 4 of zuur et al 2012 for a poisson
glmm with one random group-factor).
Problem: Am I going to have a type-I-error-increase because of ignoring
the truncation? Would it be statistical okay to just incorporate the

AFZ: If the majority of the observed values are close to 0, yes.

  response-variable for the poisson-model as (RESPONSES-1), pushing the
whole variable one value lower, or does this implements something like
noise into the mean and sd-estimates or into something else I cannot see
from my "non-statistician"-perspective? Do I overestimate something at
an overall level, because I manually separate the data into "all" with
RESPONSES Zero vs. One and a minor data-set where the "Ones" of the
former data set are fitted as a truncated poisson?
Do I have to aim for the master goal, striktly speaking to aim at the
whole model all in once into one coding, making use of the
zero-ones-trick (an approach which maybe could give the opportunity to
implement a car-structure (time) later on for the whole datat set, too)?
I tried to work out how to set up something like a (literally-speaking)
"hurdle"-model as the one in pscl-package, but also accounting for a
random intercept (glm to glmm-"upgrade"), which is something I could not
find something similar of but gibs-sampling... I consultated zuur 2012,
kery 2010, ntzoufras 2009, neelon 2010 (journal article in "Statistical
Modelling"), gelman 2007 and more, but to summarize, I think many of
these books and many of the things I found on the net or in the mailing
list posts do not think of such an amateur question like mine, so I
didn't found an answer... Could anybody give me a hint? Neelon et al.
seem to incorporate all in once, but they are using a response variable
what I think is a kind of proportion and I didn't find out how to cope
with my own data in such a "coding environment" yet...

2.) (Minor Q.) When I try to implement a predictive check via a
replicated data set in the same model (like proposed by zuur, kery and
ntzoufras), I end up in a winbugs- or openbugs- crash at about 70% of
trials (all stucks at gen.inits)

AFZ: Difficult to comment on without code/details.

and I really think I have to specify
inits myself for the replicates, but firstly I do not find any examples
doing so (or I am blind) and secondly I am very insecure about how to
choose good inits here...anybody a hint for this? maybe it also depends
on how I choose the priors?

Thanks to everyone for any help, I really am a bit exhausted after
reading every bit of paper I could get a grip on in the last month to
learn R, LMMs and GLMMS from the scratch for myself alone, but at the
monent I stuck at these two problems for about two weeks and I do not
know where to look any further.

Kind Regards, Ulf

P.S. Would an implementation into JAGS or STAN be faster than winbugs or
openbugs regarding the slow mixing and simulation time or does this
mainly depends only on my choosen priors and inits?

AFZ: Yes....JAGS is considerable faster. Coding is nearly identical.

Good luck!

Alain Zuur

-- 

________________________________________

Dipl.-Psych. Ulf Köther

AG Klinische Neuropsychologie
Klinik für Psychiatrie und Psychotherapie
Universitätsklinikum Hamburg-Eppendorf
Martinistr. 52
20246 Hamburg

Tel.: +49 (0) 40 7410 55851
Fax:  +49 (0) 40 7410 57566

ukoether at uke.de <mailto:ukoether at uke.de>
www.uke.de/neuropsych <http://www.uke.de/neuropsych>