[R] Random forest regression: feedback on general approach and possible issues

Johannes Klene jklene000 at gmail.com
Fri Dec 4 10:15:58 CET 2015


Hi all,
I'd like to use random forest regression to say something about the
importance of a set of genes (binary) for schizophrenia-related behavior
(continuous measure). I am still reading up on this technique, but would
already really appreciate any feedback on whether my approach is valid.
So...using the randomForest package, is it a good approach to enter a few
dozen binary predictors to assess their importance (as a set, and
individually) for a continuous measure with a sample size of ~1000 people?
More specific questions:
- I have an additional interest in interactions (though perhaps not the
best word in this context), does it make any sense to say something about
the influence one predictor has over others by looking at the change in
estimated importance of the others when that predictor is removed from the
model?
- I have a few siblings in the data, i.e. non-independence, is this a
problem and if so, is there anything I can do about it?
- The few papers I have seen so far on using this technique in a similar
situation do not include any 'standard' covariates such as age and gender,
should I?
Any and all feedback is greatly appreciated!! Kind regards, Johannes

p.s. Hope I've come to the right place despite this being a more general
question, if not please let me know of a forum where this is more suited
for.

	[[alternative HTML version deleted]]



More information about the R-help mailing list