[R] Logistic regression problem

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Sep 30 22:54:02 CEST 2008

Milicic B. Marko wrote:
> The only solution I can see is fitting all possib le 2 factor models enabling
> interactions and then assessing if interaction term is significant...
> any more ideas?

Please don't suggest such a thing unless you do simulations to back up 
its predictive performance, type I error properties, and the impact of 
collinearities.  You'll find this approach works as well as the U.S. 

Frank Harrell

> Milicic B. Marko wrote:
>> I have a huge data set with thousands of variable and one binary
>> variable. I know that most of the variables are correlated and are not
>> good predictors... but...
>> It is very hard to start modeling with such a huge dataset. What would
>> be your suggestion. How to make a first cut... how to eliminate most
>> of the variables but not to ignore potential interactions... for
>> example, maybe variable A is not good predictor and variable B is not
>> good predictor either, but maybe A and B together are good
>> predictor...
>> Any suggestion is welcomed
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list