[R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

Fri Jan 7 19:45:45 CET 2011

On 01/07/2011 06:13 AM, Spencer Graves wrote:
>       A more insidious problem, that may not affect the work of Jonah 
> Lehrer, is political corruption in the way research is funded, with 
> less public and more private funding of research 
Maybe I'm too pessimistic, but the term _political_ corruption reminds 
me that I can just as easily imagine a "funding bias"* in public 
funding. And I'm not sure it is (or would be) less of a problem just 
because the interests of private funding are easier to spot.

* I think of bias on both sides: the funding agency selecting the 
studies to support and the researcher subconsciously complying to the 
expectations of the funding agency.

On 01/07/2011 08:06 AM, Peter Langfelder wrote:
> > From a purely statistical and maybe somewhat naive point of view,
> published p-values should be corrected for the multiple testing that
> is effectively happening because of the large number of published
> studies. My experience is also that people will often try several
> statistical methods to get the most significant p-value but neglect to
> share that fact with the audience and/or at least attempt to correct
> the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression 
that the corrected p-value would be kind of the right answer to the 
wrong question. I'm not particularly interested in the probability of 
arriving at  the presented findings if the null hypothesis were true. 
I'd rather know the probability that the conclusions are true. Switching 
to the language of clinical chemistry, this is: I'm presented with the 
sensitivity of a test, but I really want to know the positive predictive 
value. What is still missing with the corrected p-values is the 
"prevalence of good ideas" of the publishing scientist (not even known 
for all scientists).  And I'm not sure this is not decreasing if the 
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the 
books of Beck-Bornholdt and Dubben (which I'm afraid are only available 
in German).

Conclusion: try to be/become a good scientist: with a high prevalence of 
good ideas. At least with a high prevalence of good ideas among the 
tested hypotheses. Including thinking first which hypotheses are the 
ones to test, and not giving in to the temptation to try out more and 
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a 
presentation where I explicitly talked about why I did not do any 
data-driven optimization of my models. Yet in the discussion I was very 
prominently told I need to try in addition these other pre-processing 
techniques and these other modeling techniques - even by people whom I 
know to be very much aware and concerned about optimistically biased 
validation results. Which were of course very valid questions (and easy 
to comply), but I conclude it is common/natural/human to have and want 
to try out more ideas.
Also, after several years in the field and with the same kind of samples 
of course I run the risk of my ideas being overfit to our kind of 
samples - this is a cost that I have to pay for the gain due to 
experience/expertise.

Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work 
going into round robin trials in order to measure the "natural" 
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer, 
i.e. making quantitative predictive models work on a different 
instrument. This is always a whole lot of work, and for some fields of 
problems at the moment considered basically impossible even between two 
instruments of the same model and manufacturer.
The quoted results on the mice are not very astonishing to me... ;-)

- Talking about (not so) astonishing differences between between 
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting 
e.g. the 5th to 95th percentiles. Not only because my data distributions 
are often not symmetric, but also because I find Im not able to directly 
perceive the real spread of the data from a standard deviation error 
bar. This is all about perception, of course I can reflect about the 
meaning. Such a reflection also tells me that one student having a 
really unlikely number of right guesses is unlikely but not impossible. 
There is no statistical law stating that unlikely events happen only 
with large sample sizes/number of tests. Yet the immediate perception is 
completely different.

- I happily agree with the ideas of publishing findings (conclusions) as 
well as the data and data analysis code I used to arrive there. But I'm 
aware that part of this agreement is due to the fact that I'm quite 
interested in the data analytical methods (I'd say as well as in the 
particular chemical-analytical problem at hand, but rather more than my 
purely experimental colleagues). This means that psychologically I'm 
happy enough with my work if I can introduce a new method (variant) even 
if the results for the chemical-analytical problem aren't that good 
neither for the standard method nor for the variant.
I remember a discussion between someone from a data-analysis/method 
developing group with an experimental scientist. The "methods guy" was 
complaining that the experimental people are so reluctant about making 
their data public. However, the data sets were for him a prerequisite to 
test his data analysis ideas. For the experimental guy they were the 
_product_ of long work, which he felt not properly appreciated by the 
methods people.

On a more "practical" level, publishing code and data implies additional 
documentation work. I have a large part of the documentation and meta 
information hand-written in lab books (particularly of the experimental 
data) and/or in German language. In order to take the effort to convert 
this to electronical formats and English language, the effort must be 
appropriately rewarded.

Ravi, I also agree with your point that we should make better use of our 
experimental data. However, this is in practice very difficult in my 
field (vibrational spectroscopy for medical diagnosis). We have the 
mentioned problems of calibration transfer, neither do we yet have 
standardized sample treatment protocols. Thus, it is difficult to 
combine different series of measurements. As long as I'm not talking 
about measurements of reference substances, the chances that someone 
else will be able to make much use of my data are therefore rather low.

For the moment these points taken together mean that I'd happily share 
my data (and code) if I'm asked, but not without. And, considering the 
current publication rules in my field, I first want to play a bit more 
with my data myself, before I leave it to everyone else.

My 2 ct,

Claudia