[R] Model Comparision for case control studies in R

Hana Tezera h@n@tezer@ @end|ng |rom gm@||@com
Wed Jun 15 22:52:43 CEST 2022

Dear Tim, Thanks a lot I am looking for different methods for each
method, I want to select the best predictors and I want to report some
measures of the accuracy. And I will compare the performance of the
models, by plotting their ROC curves.

On 6/15/22, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
> The uncorrelated nature of smoking and hypertension is a major medical
> breakthrough and in contrast to reports like this:
> https://pubmed.ncbi.nlm.nih.gov/20550499/ and the literature indicates the
> possibility of a relationship between age and hypertension
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4768730/. Depending on the
> country, there might be a relationship between smoking and age as government
> programs against smoking are developed.
> Are you looking at different models or different methods. I could have y = x
> + y + z as one model and y=x + z as another model. Alternatively I could be
> comparing ordinary least squares regression versus maximum likelihood versus
> Bayesian linear regression versus nonlinear regression. The former might use
> something like the Akaike information criterion. I am not sure the latter is
> useful (or possible). For example I could approximate an exponential
> function using a polynomial, but in this context I see no benefit in doing
> so even if I could compare the models.
> I do not quite understand why this is being done. It feels like fishing
> statistical methods to get the answer that I know is correct. Generally, one
> should understand the system well enough to select an appropriate model
> rather than try every possible model in the hope something fits. Of course
> one sometimes collects extra data in the hope that we do not miss an
> important feature. Then forwards/backwards/stepwise methods are used to
> identify the "best" model but this is looking at similar models that differ
> only in the list of independent variables.
> However the problem is solved, I would start by trying to determine if any
> one model was appropriate. Are the model assumptions satisfied? If the
> answer is no, then try another model until you find one that does satisfy
> the model assumptions. Alternatively, start with an understanding of the
> biology and use the best model. Comparing an biologically meaningless
> statistical model to a biologically meaningful one is an easy choice.
> Tim
> -----Original Message-----
> From: anteneh asmare <hanatezera using gmail.com>
> Sent: Wednesday, June 15, 2022 1:10 PM
> To: Ebert,Timothy Aaron <tebert using ufl.edu>
> Cc: r-help using r-project.org
> Subject: Re: [R] Model Comparision for case control studies in R
> [External Email]
> Dear Tim, Thanks. the first vector
> y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) is the disease status y=
> (1=Case,0=Control). The covariate age, smoking status and hypertension are
> independent(uncorrelated). The logistic regression (unconditional) will
> used. But I need to compare other models with logistic regression instead of
> fitting it directly to logistic regression.
> There is no matching on the data to use conditional logistics regression.
> Best,
> Hana
> On 6/15/22, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>> Disease status is missing from the sample data.
>> Are age, disease, smoking, and/or hypertension correlated in any way
>> or are they independent (correlation=0)?
>> Are the correlations large enough to adversely influence your model?
>> Tim
>> -----Original Message-----
>> From: R-help <r-help-bounces using r-project.org> On Behalf Of anteneh
>> asmare
>> Sent: Wednesday, June 15, 2022 7:29 AM
>> To: r-help using r-project.org
>> Subject: [R] Model Comparision for case control studies in R
>> [External Email]
>> y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1)
>> age<-c(45,23,56,67,23,23,28,56,45,47,36,37,33,35,38,39,43,28,39,41)
>> smoking<-c(0,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1)
>> hypertension<-c(1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0)
>> data<-data.frame(y,age,smoking,hypertension)
>> data
>> model<-glm(y~age+factor(smoking)+factor(hypertension), data, family =
>> binomial(link = "logit"),na.action = na.omit)
>> summary(model)
>> from above sample data I want to study a case-control study on male
>> individuals with my response variable y, disease status (1=Case,
>> 0=Control) with covariates age, smoking status(1=Yes, 0=No)  and
>> hypertension, hypertensive (1=Yes, 0=No). I want to fit the model to
>> predict the disease status using at least two different methods. And
>> to make model comparisons. I think logistic regression will be the
>> best fit for this case control study. Do we have other options in addition
>> to logistic regression?
>> My objective is to fit the model to predict the disease status using
>> at least two different methods.
>> Kind regards,
>> Hana
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
>> man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
>> Rzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWww
>> ig4oYOB&s=ztyDthknydhlcM49F33Gz6xRl6G7U9s8aIhB1VN-EKY&e=
>> PLEASE do read the posting guide
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
>> g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
>> sRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWw
>> wig4oYOB&s=tcsGkhvtVvoVvb1Ehah-vLRC6an40rJXQXqqfX2f0gI&e=
>> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list