[R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

Spencer Graves spencer.graves at structuremonitoring.com
Fri Jan 7 19:01:07 CET 2011

       I applaud your efforts, Ravi.  Regarding "Whose data is it?", I 
humbly suggest that referees and editorial boards push (demand?) for 
rules that require the raw data be made available to the referees and 
concurrent with publication.


On 1/7/2011 8:43 AM, Ravi Varadhan wrote:
> I have just recently written about this issue (i.e. open learning and data
> sharing) in a manuscript that is currently under review in a clinical
> journal.  I have argued that data hoarding is unethical.  Participants in
> research studies give their time, effort, saliva and blood in the altruistic
> hope that their sacrifice will benefit humankind.  If they were to realize
> that the real (ulterior) motive of the study investigators is only to
> advance their careers, they would really think hard about participating in
> the studies.  The study participants should only consent to participate if
> they can get a signed assurance from the investigators that the
> investigators will make their data available for scrutiny and for public use
> (under some reasonable conditions that are fair to the study investigators).
> As Vickers (Trials 2006) says, "whose data is it anyway?"  I believe that we
> can achieve great progress in clinical research if and only if we make a
> concerted effort towards open learning. Stakeholders (i.e. patients,
> clinicians, policy-makers) should demand that all the data that is
> potentially relevant to addressing a critical clinical question should be
> made available in an open learning environment.  Unless, we can achieve this
> we cannot solve the problems of publication bias and inefficient and
> sub-optimal use of data.
> Best,
> Ravi.
> -------------------------------------------------------
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Spencer Graves
> Sent: Friday, January 07, 2011 8:26 AM
> To: Mike Marchywka
> Cc: r-help at r-project.org
> Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
> scientific validity
>         I wholeheartedly agree with the trend towards publishing datasets.
> One way to do that is as datasets in an R package contributed to CRAN.
>         Beyond this, there seems to be an increasing trend towards journals
> requiring authors of scientific research to publish their data as well.  The
> Public Library of Science (PLOS) has such a policy, but it is not enforced:
> Savage and Vickers (2010) were able to get the raw data behind only one of
> ten published articles they tried, and that one came only after reminding
> the author that s/he had agreed to making the data available as a condition
> of publishing in PLOS.  (Four other authors refused to share their data in
> spite of their legal and moral commitment to do so as a condition of
> publishing in PLOS.)
>         There are other venues for publishing data.  For example, much
> astronomical data is now routinely web published so anyone interested can
> test their pet algorithm on real data
> (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
> cal-data).
>         Regarding my earlier comment, I just found a Wikipedia article on
> "scientific misconduct" that mentioned the tendency to refuse to publish
> research that proves your new drug is positively harmful.  This is an
> extreme version of both types of bias I previously mentioned:  (1) only
> significant results get published.  (2) private funding provides its own
> biases.
>         Spencer
> #########
> Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors
> Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010
> (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
> y-authors-publishing-in-plos-journals-2
> <http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
> y-authors-publishing-in-plos-journals-2/>).
> On 1/7/2011 4:08 AM, Mike Marchywka wrote:
>>> Date: Thu, 6 Jan 2011 23:06:44 -0800
>>> From: peter.langfelder at gmail.com
>>> To: r-help at r-project.org
>>> Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
>>> scientific validity
>>>>  From a purely statistical and maybe somewhat naive point of view,
>>> published p-values should be corrected for the multiple testing that
>>> is effectively happening because of the large number of published
>>> studies. My experience is also that people will often try several
>>> statistical methods to get the most significant p-value but neglect
>>> to share that fact with the audience and/or at least attempt to
>>> correct the p-values for the selection bias.
>> You see this everywhere in one form or another from medical to
>> financial modelling. My solution here is simply to publish more raw
>> data in a computer readable form, in this case of course something
>> easy to get with R, so disinterested or adversarial parties can run their
> own "analysis."
>> I think there was also a push to create a data base for failed drug
>> trials that may contain data of some value later. The value of R with
>> easily available data for a large cross section of users could be to
>> moderate problems like the one cited here.
>> I almost
>> slammed a poster here earlier who wanted a simple rule for "when do I
>> use this test" with something like " when your mom tells you to" since
>> post hoc you do just about everything to assume you messed up and
>> missed something but a priori you hope you have designed a good
>> hypothesis. And at the end of the day, a given p-value is one piece of
>> evidence in the overall objective of learning about some system, not
>> appeasing a sponsor. Personally I'm a big fan of post hoc analysis on
>> biotech data in some cases, especially as more pathway or other theory
>> is published, but it is easy to become deluded if you have a conclusion
> that you know JUST HAS TO BE RIGHT.
>> Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric,
>> the data I've been able to get tends to make me side with the FDA and
>> I still hate the idea of any regulation or access restrictions but it
>> seems to be the only way to keep sponsors honest to any extent. Your
>> mileage may vary however, take a look at some rather loud disagreement
>> with FDA over earlier DNDN panel results, possibly involving threats
> against critics. LOL.
>>> That being said, it would seem that biomedical sciences do make
>>> progress, so some of the published results are presumably correct :)
>>> Peter
>>> On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
>>>    wrote:
>>>>        Part of the phenomenon can be explained by the natural
>>>> censorship in what is accepted for publication:  Stronger results
>>>> tend to have less difficulty getting published.  Therefore, given
>>>> that a result is published, it is evident that the estimated
>>>> magnitude of the effect is in average larger than it is in reality,
>>>> just by the fact that weaker results are less likely to be
>>>> published.  A study of the literature on this subject might yield an
>>>> interesting and valuable estimate of the magnitude of this selection
> bias.
>>>>        A more insidious problem, that may not affect the work of
>>>> Jonah Lehrer, is political corruption in the way research is funded,
>>>> with less public and more private funding of research
> (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&U
> RL_SECTION=201.html).
>>>>    For example, I've heard claims (which I cannot substantiate right
>>>> now) that cell phone companies allegedly lobbied successfully to
>>>> block funding for researchers they thought were likely to document
>>>> health problems with their products.  Related claims have been made
>>>> by scientists in the US Food and Drug Administration that certain
>>>> therapies were approved on political grounds in spite of substantive
>>>> questions about the validity of the research backing the request for
>>>> approval (e.g., www.naturalnews.com/025298_the_FDA_scientists.html).
>>>> Some of these accusations of political corruption may be groundless.
>>>> However, as private funding replaces tax money for basic science, we
>>>> must expect an increase in research results that match the needs of
>>>> the funding agency while degrading the quality of published
>>>> research.  This produces more research that can not be replicated --
>>>> effects that get smaller upon replication.  (My wife and I routinely
>>>> avoid certain therapies recommended by physicians, because the
>>>> physicians get much of their information on recent drugs from the
>>>> pharmaceuticals, who have a vested interest in presenting their
>>>> products in the most positive light.)
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list