[R] predictive accuracy

El-Tahtawy, Ahmed Ahmed.El-Tahtawy at pfizer.com
Tue May 31 16:44:17 CEST 2011


1-	I used R packages (design, lasso) to develop and validate prognostic models. I could have enclosed optimism from the model with and without the strong irrelevant predictor, but that will make the message very long (against guidelines for the site). 

2-	This issue is challenging and interesting and we can all learn something from each other - no one answer is right, but, we are seeking the more reliable accurate method to handle such situation that many of us may encounter.

I came to this group after exhausting many consults. This group has some of the best minds in the area of using R to solve challenging issues.

Best Regards
Ahmed 


-----Original Message-----
From: Mike Marchywka [mailto:marchywka at hotmail.com] 
Sent: Thursday, May 26, 2011 8:55 PM
To: gunter.berton at gene.com; El-Tahtawy, Ahmed
Cc: r-help at r-project.org
Subject: RE: [R] predictive accuracy

----------------------------------------
> Date: Thu, 26 May 2011 13:50:15 -0700
> From: gunter.berton at gene.com
> To: Ahmed.El-Tahtawy at pfizer.com
> CC: r-help at r-project.org
> Subject: Re: [R] predictive accuracy
>
> 1. This is not about R, and should be taken off list.

Well, depending on what mod's think a little bit of
generic "how do I REALLY use this tool" discussion may be of 
benefit for all here- a maillist for a certain brand of hammer
may discuss various uses and types of nails etc.

Pesonally I have an interest in this-if the OP will 
post the data it may be possible to explore some analysis
options. 

> 2. You are wading in an alligator infested swamp. Get help from
> (other) statisticians at Pfizer (there are many good ones there).

I thought that is what statisticians do? LOL. 
We don't know the situation- intern, looking for outside ideas after
exhausting internals, specific issues with internal peers,
summer student not wishing to bother everyone there for details etc. 

>
> Best,
> Bert
>
> P.S. The answer to all your questions is "no" (imho).



>
>
>
> On Thu, May 26, 2011 at 1:35 PM, El-Tahtawy, Ahmed
>  wrote:
> > The strong predictor is the country/region where the study was
> > conducted. So it is not important/useful for a clinician to use it (as
> > long he/she is in USA or Europe).
> > Excluding that predictor will make another 2 insignificant predictors to
> > become significant!!  Can the new model have a reliable predictive
> > accuracy? I thought of excluding all patients from other countries and
> > develop the model accordingly- is the exclusion of a lot of patients and
> > compromise of the power is more acceptable??

LOL, quite the contrary, post hoc selection increases power to find
whatever you or sponsor desire... 

Presuming your general interest is in finding out attributes of a given
drug under various conditions, you would probably want to combine 
the observations with tentative thoughts on causality and see
what makes the best story. 

Statistical significance in isolation is a function of the data and analysis method,
doesn't really have anything specific to do with underlying systems.

In this case, if you have other continuous prognostic factors, say 
age, LDH, hemoglobin come to mind, you may be able to find that you
have nonmonotinc  relations between prognostic factor and outcome.
But, furhter,say you have enough patients that you could in fact map
dose response curves. It may turn out that this curve is in fact non-montonic
with parameters non-monotonic in prognsotic factor. Consider 

avg_survival= a+b*d-c*d^2

where d is the dose. At for small d, it seems to help but for larger dose it 
makes things worse. Now consider that "c" is a complicated function
of hematocrit, it may not be hard to imagine that anemics and siderositic( is 
that a word LOL?) have some underlying problems dealing with your drug. 
These may be distributed geographically etc.

This is all stuff you can simulate in R or even on paper. 


It sounds like you are already trying to write a label, which may
be a bit premature ( although I defer to the guy from DNA for that LOL).
" indicated for use in patients in Western Hemisphere with .... " 

You may have decent luck looking at FDA panel discussion transcripts, search for
related general stats terms confined to "site:fda.gov"


> > Thanks for your help...
> > Al
> >
> > -----Original Message-----
> > From: Marc Schwartz [mailto:marc_schwartz at me.com]
> > Sent: Thursday, May 26, 2011 10:54 AM
> > To: El-Tahtawy, Ahmed
> > Cc: r-help at r-project.org
> > Subject: Re: [R] predictive accuracy
> >
> >
> > On May 26, 2011, at 7:42 AM, El-Tahtawy, Ahmed wrote:
> >
> >> I am trying to develop a prognostic model using logistic regression.
> > I
> >> built a full , approximate models with the use of penalization -
> > design
> >> package. Also, I tried Chi-square criteria, step-down techniques. Used
> >> BS for model validation.
> >>
> >> > The main purpose is to develop a predictive model for future patient
> >> population.   One of the strong predictor pertains to the study design
> >> and would not mean much for a clinician/investigator in real clinical
> >> situation and have been asked to remove it.
> >> > Can I propose a model and nomogram without that strong -irrelevant
> >> predictor?? If yes, do I need to redo model calibration,
> > discrimination,
> >> validation, etc...?? or just have 5 predictors instead of 6 in the
> >> prognostic model??
> >>
> >>
> >>
> >> Thanks for your help
> >>
> >> Al
> >
> >
> > Is it that the study design characteristic would not make sense to a
> > clinician but is relevant to future samples, or that the study design
> > characteristic is unique to the sample upon which the model was
> > developed and is not relevant to future samples because they will not be
> > in the same or a similar study?
> >
> > Is the study design characteristic a surrogate for other factors that
> > would be relevant to future samples? If so, you might engage in a
> > conversation with the clinicians to gain some insights into other
> > variables to consider for inclusion in the model, that might in turn,
> > help to explain the effect of the study design variable.
> >
> > Either way, if the covariate is removed, you of course need to engage in
> > fully re-evaluating the model. You cannot just drop the covariate and
> > continue to use model fit assessments made on the full model.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
>
> -- Maimonides (1135-1204)
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  



More information about the R-help mailing list