[R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
cbeleites at units.it
Fri Jan 7 19:45:45 CET 2011
On 01/07/2011 06:13 AM, Spencer Graves wrote:
> A more insidious problem, that may not affect the work of Jonah
> Lehrer, is political corruption in the way research is funded, with
> less public and more private funding of research
Maybe I'm too pessimistic, but the term _political_ corruption reminds
me that I can just as easily imagine a "funding bias"* in public
funding. And I'm not sure it is (or would be) less of a problem just
because the interests of private funding are easier to spot.
* I think of bias on both sides: the funding agency selecting the
studies to support and the researcher subconsciously complying to the
expectations of the funding agency.
On 01/07/2011 08:06 AM, Peter Langfelder wrote:
> > From a purely statistical and maybe somewhat naive point of view,
> published p-values should be corrected for the multiple testing that
> is effectively happening because of the large number of published
> studies. My experience is also that people will often try several
> statistical methods to get the most significant p-value but neglect to
> share that fact with the audience and/or at least attempt to correct
> the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression
that the corrected p-value would be kind of the right answer to the
wrong question. I'm not particularly interested in the probability of
arriving at the presented findings if the null hypothesis were true.
I'd rather know the probability that the conclusions are true. Switching
to the language of clinical chemistry, this is: I'm presented with the
sensitivity of a test, but I really want to know the positive predictive
value. What is still missing with the corrected p-values is the
"prevalence of good ideas" of the publishing scientist (not even known
for all scientists). And I'm not sure this is not decreasing if the
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the
books of Beck-Bornholdt and Dubben (which I'm afraid are only available
Conclusion: try to be/become a good scientist: with a high prevalence of
good ideas. At least with a high prevalence of good ideas among the
tested hypotheses. Including thinking first which hypotheses are the
ones to test, and not giving in to the temptation to try out more and
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a
presentation where I explicitly talked about why I did not do any
data-driven optimization of my models. Yet in the discussion I was very
prominently told I need to try in addition these other pre-processing
techniques and these other modeling techniques - even by people whom I
know to be very much aware and concerned about optimistically biased
validation results. Which were of course very valid questions (and easy
to comply), but I conclude it is common/natural/human to have and want
to try out more ideas.
Also, after several years in the field and with the same kind of samples
of course I run the risk of my ideas being overfit to our kind of
samples - this is a cost that I have to pay for the gain due to
Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work
going into round robin trials in order to measure the "natural"
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer,
i.e. making quantitative predictive models work on a different
instrument. This is always a whole lot of work, and for some fields of
problems at the moment considered basically impossible even between two
instruments of the same model and manufacturer.
The quoted results on the mice are not very astonishing to me... ;-)
- Talking about (not so) astonishing differences between between
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting
e.g. the 5th to 95th percentiles. Not only because my data distributions
are often not symmetric, but also because I find Im not able to directly
perceive the real spread of the data from a standard deviation error
bar. This is all about perception, of course I can reflect about the
meaning. Such a reflection also tells me that one student having a
really unlikely number of right guesses is unlikely but not impossible.
There is no statistical law stating that unlikely events happen only
with large sample sizes/number of tests. Yet the immediate perception is
- I happily agree with the ideas of publishing findings (conclusions) as
well as the data and data analysis code I used to arrive there. But I'm
aware that part of this agreement is due to the fact that I'm quite
interested in the data analytical methods (I'd say as well as in the
particular chemical-analytical problem at hand, but rather more than my
purely experimental colleagues). This means that psychologically I'm
happy enough with my work if I can introduce a new method (variant) even
if the results for the chemical-analytical problem aren't that good
neither for the standard method nor for the variant.
I remember a discussion between someone from a data-analysis/method
developing group with an experimental scientist. The "methods guy" was
complaining that the experimental people are so reluctant about making
their data public. However, the data sets were for him a prerequisite to
test his data analysis ideas. For the experimental guy they were the
_product_ of long work, which he felt not properly appreciated by the
On a more "practical" level, publishing code and data implies additional
documentation work. I have a large part of the documentation and meta
information hand-written in lab books (particularly of the experimental
data) and/or in German language. In order to take the effort to convert
this to electronical formats and English language, the effort must be
Ravi, I also agree with your point that we should make better use of our
experimental data. However, this is in practice very difficult in my
field (vibrational spectroscopy for medical diagnosis). We have the
mentioned problems of calibration transfer, neither do we yet have
standardized sample treatment protocols. Thus, it is difficult to
combine different series of measurements. As long as I'm not talking
about measurements of reference substances, the chances that someone
else will be able to make much use of my data are therefore rather low.
For the moment these points taken together mean that I'd happily share
my data (and code) if I'm asked, but not without. And, considering the
current publication rules in my field, I first want to play a bit more
with my data myself, before I leave it to everyone else.
My 2 ct,
More information about the R-help