[R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

Fri Jan 7 19:24:14 CET 2011

I think that the strategy of Editors simply telling the authors to share or
perish is a bit naïve.  There are a number of practical challenges that need
to be addressed in order to create a fair and effective open-learning
environment.  Eysenbach (BMJ 2001) and Vickers (2006) discuss these and some
partial solutions.  We need more creative thinking that uses both carrot and
sticks. We also need more empirical experience with this.  Perhaps, we can
learn from fields, if there are any, that do a good job of data sharing and
open learning.

Best,
Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

-----Original Message-----
From: Spencer Graves [mailto:spencer.graves at structuremonitoring.com] 
Sent: Friday, January 07, 2011 1:01 PM
To: Ravi Varadhan
Cc: 'Mike Marchywka'; r-help at r-project.org
Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
scientific validity

       I applaud your efforts, Ravi.  Regarding "Whose data is it?", I 
humbly suggest that referees and editorial boards push (demand?) for 
rules that require the raw data be made available to the referees and 
concurrent with publication.

       Spencer

On 1/7/2011 8:43 AM, Ravi Varadhan wrote:
> I have just recently written about this issue (i.e. open learning and data
> sharing) in a manuscript that is currently under review in a clinical
> journal.  I have argued that data hoarding is unethical.  Participants in
> research studies give their time, effort, saliva and blood in the
altruistic
> hope that their sacrifice will benefit humankind.  If they were to realize
> that the real (ulterior) motive of the study investigators is only to
> advance their careers, they would really think hard about participating in
> the studies.  The study participants should only consent to participate if
> they can get a signed assurance from the investigators that the
> investigators will make their data available for scrutiny and for public
use
> (under some reasonable conditions that are fair to the study
investigators).
> As Vickers (Trials 2006) says, "whose data is it anyway?"  I believe that
we
> can achieve great progress in clinical research if and only if we make a
> concerted effort towards open learning. Stakeholders (i.e. patients,
> clinicians, policy-makers) should demand that all the data that is
> potentially relevant to addressing a critical clinical question should be
> made available in an open learning environment.  Unless, we can achieve
this
> we cannot solve the problems of publication bias and inefficient and
> sub-optimal use of data.
>
> Best,
> Ravi.
> -------------------------------------------------------
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Spencer Graves
> Sent: Friday, January 07, 2011 8:26 AM
> To: Mike Marchywka
> Cc: r-help at r-project.org
> Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
> scientific validity
>
>         I wholeheartedly agree with the trend towards publishing datasets.
> One way to do that is as datasets in an R package contributed to CRAN.
>
>
>         Beyond this, there seems to be an increasing trend towards
journals
> requiring authors of scientific research to publish their data as well.
The
> Public Library of Science (PLOS) has such a policy, but it is not
enforced:
> Savage and Vickers (2010) were able to get the raw data behind only one of
> ten published articles they tried, and that one came only after reminding
> the author that s/he had agreed to making the data available as a
condition
> of publishing in PLOS.  (Four other authors refused to share their data in
> spite of their legal and moral commitment to do so as a condition of
> publishing in PLOS.)
>
>
>         There are other venues for publishing data.  For example, much
> astronomical data is now routinely web published so anyone interested can
> test their pet algorithm on real data
>
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
> cal-data).
>
>
>
>         Regarding my earlier comment, I just found a Wikipedia article on
> "scientific misconduct" that mentioned the tendency to refuse to publish
> research that proves your new drug is positively harmful.  This is an
> extreme version of both types of bias I previously mentioned:  (1) only
> significant results get published.  (2) private funding provides its own
> biases.
>
>
>         Spencer
>
>
> #########
> Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors
> Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010
>
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
> y-authors-publishing-in-plos-journals-2
>
<http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
> y-authors-publishing-in-plos-journals-2/>).
>
>
>
>
> On 1/7/2011 4:08 AM, Mike Marchywka wrote:
>>
>>
>>
>>
>>
>>> Date: Thu, 6 Jan 2011 23:06:44 -0800
>>> From: peter.langfelder at gmail.com
>>> To: r-help at r-project.org
>>> Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
>>> scientific validity
>>>
>>>>  From a purely statistical and maybe somewhat naive point of view,
>>> published p-values should be corrected for the multiple testing that
>>> is effectively happening because of the large number of published
>>> studies. My experience is also that people will often try several
>>> statistical methods to get the most significant p-value but neglect
>>> to share that fact with the audience and/or at least attempt to
>>> correct the p-values for the selection bias.
>> You see this everywhere in one form or another from medical to
>> financial modelling. My solution here is simply to publish more raw
>> data in a computer readable form, in this case of course something
>> easy to get with R, so disinterested or adversarial parties can run their
> own "analysis."
>> I think there was also a push to create a data base for failed drug
>> trials that may contain data of some value later. The value of R with
>> easily available data for a large cross section of users could be to
>> moderate problems like the one cited here.
>>
>> I almost
>> slammed a poster here earlier who wanted a simple rule for "when do I
>> use this test" with something like " when your mom tells you to" since
>> post hoc you do just about everything to assume you messed up and
>> missed something but a priori you hope you have designed a good
>> hypothesis. And at the end of the day, a given p-value is one piece of
>> evidence in the overall objective of learning about some system, not
>> appeasing a sponsor. Personally I'm a big fan of post hoc analysis on
>> biotech data in some cases, especially as more pathway or other theory
>> is published, but it is easy to become deluded if you have a conclusion
> that you know JUST HAS TO BE RIGHT.
>> Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric,
>> the data I've been able to get tends to make me side with the FDA and
>> I still hate the idea of any regulation or access restrictions but it
>> seems to be the only way to keep sponsors honest to any extent. Your
>> mileage may vary however, take a look at some rather loud disagreement
>> with FDA over earlier DNDN panel results, possibly involving threats
> against critics. LOL.
>>
>>
>>
>>
>>> That being said, it would seem that biomedical sciences do make
>>> progress, so some of the published results are presumably correct :)
>>>
>>> Peter
>>>
>>> On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
>>>    wrote:
>>>>        Part of the phenomenon can be explained by the natural
>>>> censorship in what is accepted for publication:  Stronger results
>>>> tend to have less difficulty getting published.  Therefore, given
>>>> that a result is published, it is evident that the estimated
>>>> magnitude of the effect is in average larger than it is in reality,
>>>> just by the fact that weaker results are less likely to be
>>>> published.  A study of the literature on this subject might yield an
>>>> interesting and valuable estimate of the magnitude of this selection
> bias.
>>>>
>>>>        A more insidious problem, that may not affect the work of
>>>> Jonah Lehrer, is political corruption in the way research is funded,
>>>> with less public and more private funding of research
>>>>
>
(http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&U
> RL_SECTION=201.html).
>>>>    For example, I've heard claims (which I cannot substantiate right
>>>> now) that cell phone companies allegedly lobbied successfully to
>>>> block funding for researchers they thought were likely to document
>>>> health problems with their products.  Related claims have been made
>>>> by scientists in the US Food and Drug Administration that certain
>>>> therapies were approved on political grounds in spite of substantive
>>>> questions about the validity of the research backing the request for
>>>> approval (e.g., www.naturalnews.com/025298_the_FDA_scientists.html).
>>>> Some of these accusations of political corruption may be groundless.
>>>> However, as private funding replaces tax money for basic science, we
>>>> must expect an increase in research results that match the needs of
>>>> the funding agency while degrading the quality of published
>>>> research.  This produces more research that can not be replicated --
>>>> effects that get smaller upon replication.  (My wife and I routinely
>>>> avoid certain therapies recommended by physicians, because the
>>>> physicians get much of their information on recent drugs from the
>>>> pharmaceuticals, who have a vested interest in presenting their
>>>> products in the most positive light.)
>>>>
>>    		 	   		
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>