[R] Model Comparision for case control studies in R

Hana Tezera h@n@tezer@ @end|ng |rom gm@||@com
Thu Jun 16 16:58:54 CEST 2022


Dear Jin, Thanks a lot!

On 6/16/22, Jin Li <jinli68 using gmail.com> wrote:
> Hi Hana,
>
> ROC (or AUC) is misleading and should not be used to assess model
> performance. For details, please see the references in "Spatial Predictive
> Modelign with R '' that also provides some methods (e.g., gbm, rf, svm and
> glmlet) for 1/0 data along with accuracy-based variable selection and
> parameter optimisation.
>
> Hope this helps,
> Jin
>
> On Thu, Jun 16, 2022 at 6:53 AM Hana Tezera <hanatezera using gmail.com> wrote:
>
>> Dear Tim, Thanks a lot I am looking for different methods for each
>> method, I want to select the best predictors and I want to report some
>> measures of the accuracy. And I will compare the performance of the
>> models, by plotting their ROC curves.
>> Best,
>> Hana
>>
>> On 6/15/22, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>> > The uncorrelated nature of smoking and hypertension is a major medical
>> > breakthrough and in contrast to reports like this:
>> > https://pubmed.ncbi.nlm.nih.gov/20550499/ and the literature indicates
>> the
>> > possibility of a relationship between age and hypertension
>> > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4768730/. Depending on the
>> > country, there might be a relationship between smoking and age as
>> government
>> > programs against smoking are developed.
>> >
>> > Are you looking at different models or different methods. I could have
>> > y
>> = x
>> > + y + z as one model and y=x + z as another model. Alternatively I
>> > could
>> be
>> > comparing ordinary least squares regression versus maximum likelihood
>> versus
>> > Bayesian linear regression versus nonlinear regression. The former
>> > might
>> use
>> > something like the Akaike information criterion. I am not sure the
>> latter is
>> > useful (or possible). For example I could approximate an exponential
>> > function using a polynomial, but in this context I see no benefit in
>> doing
>> > so even if I could compare the models.
>> >
>> > I do not quite understand why this is being done. It feels like fishing
>> > statistical methods to get the answer that I know is correct.
>> > Generally,
>> one
>> > should understand the system well enough to select an appropriate model
>> > rather than try every possible model in the hope something fits. Of
>> course
>> > one sometimes collects extra data in the hope that we do not miss an
>> > important feature. Then forwards/backwards/stepwise methods are used to
>> > identify the "best" model but this is looking at similar models that
>> differ
>> > only in the list of independent variables.
>> >
>> > However the problem is solved, I would start by trying to determine if
>> any
>> > one model was appropriate. Are the model assumptions satisfied? If the
>> > answer is no, then try another model until you find one that does
>> > satisfy
>> > the model assumptions. Alternatively, start with an understanding of
>> > the
>> > biology and use the best model. Comparing an biologically meaningless
>> > statistical model to a biologically meaningful one is an easy choice.
>> >
>> > Tim
>> >
>> > -----Original Message-----
>> > From: anteneh asmare <hanatezera using gmail.com>
>> > Sent: Wednesday, June 15, 2022 1:10 PM
>> > To: Ebert,Timothy Aaron <tebert using ufl.edu>
>> > Cc: r-help using r-project.org
>> > Subject: Re: [R] Model Comparision for case control studies in R
>> >
>> > [External Email]
>> >
>> > Dear Tim, Thanks. the first vector
>> > y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) is the disease status y=
>> > (1=Case,0=Control). The covariate age, smoking status and hypertension
>> are
>> > independent(uncorrelated). The logistic regression (unconditional) will
>> > used. But I need to compare other models with logistic regression
>> instead of
>> > fitting it directly to logistic regression.
>> > There is no matching on the data to use conditional logistics
>> > regression.
>> > Best,
>> > Hana
>> > On 6/15/22, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>> >> Disease status is missing from the sample data.
>> >> Are age, disease, smoking, and/or hypertension correlated in any way
>> >> or are they independent (correlation=0)?
>> >> Are the correlations large enough to adversely influence your model?
>> >> Tim
>> >>
>> >> -----Original Message-----
>> >> From: R-help <r-help-bounces using r-project.org> On Behalf Of anteneh
>> >> asmare
>> >> Sent: Wednesday, June 15, 2022 7:29 AM
>> >> To: r-help using r-project.org
>> >> Subject: [R] Model Comparision for case control studies in R
>> >>
>> >> [External Email]
>> >>
>> >> y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1)
>> >> age<-c(45,23,56,67,23,23,28,56,45,47,36,37,33,35,38,39,43,28,39,41)
>> >> smoking<-c(0,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1)
>> >> hypertension<-c(1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0)
>> >> data<-data.frame(y,age,smoking,hypertension)
>> >> data
>> >> model<-glm(y~age+factor(smoking)+factor(hypertension), data, family =
>> >> binomial(link = "logit"),na.action = na.omit)
>> >> summary(model)
>> >> from above sample data I want to study a case-control study on male
>> >> individuals with my response variable y, disease status (1=Case,
>> >> 0=Control) with covariates age, smoking status(1=Yes, 0=No)  and
>> >> hypertension, hypertensive (1=Yes, 0=No). I want to fit the model to
>> >> predict the disease status using at least two different methods. And
>> >> to make model comparisons. I think logistic regression will be the
>> >> best fit for this case control study. Do we have other options in
>> addition
>> >> to logistic regression?
>> >> My objective is to fit the model to predict the disease status using
>> >> at least two different methods.
>> >> Kind regards,
>> >> Hana
>> >>
>> >> ______________________________________________
>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
>> >> man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
>> >> Rzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWww
>> >> ig4oYOB&s=ztyDthknydhlcM49F33Gz6xRl6G7U9s8aIhB1VN-EKY&e=
>> >> PLEASE do read the posting guide
>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
>> >> g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
>> >> sRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWw
>> >> wig4oYOB&s=tcsGkhvtVvoVvb1Ehah-vLRC6an40rJXQXqqfX2f0gI&e=
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Jin
> ------------------------------------------
> Jin Li, PhD
> Founder, Data2action, Australia
> https://www.researchgate.net/profile/Jin_Li32
> https://scholar.google.com/citations?user=Jeot53EAAAAJ&hl=en
>



More information about the R-help mailing list