[R-sig-eco] how-to identify redundant predictors

Chris Howden chris at trickysolutions.com.au
Mon Apr 23 02:11:19 CEST 2012


U should be able to get pretty close to what u want using the methods
u've mentioned.

Seems to me your after an analysis that answers this question 'find me
the 5 variables that explain the most amount of response variation,
and also have the least correlation'

I believe standard model fitting methods using r-squared as the
criteria answers the first half of that question, but it doesn't
explicitly try to find uncorrelated predictors.

HOWEVER, if u fit predictors one at a time using the right SS then it
may be implicitly doing this since the correlated part in subsequently
added predictors should have already been fit? Im  thinking out loud
here so contradictory comments are welcome.

The trick would be using the right fitting algorithm, I'm not sure if
likelihood would work it might need to be something that fits each
predictor one at a time.

If u do  find a 'black box' method I'd be interested to hear about it.



Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.

On 21/04/2012, at 19:05, C Hess <13184 at stud.leuphana.de> wrote:

> dear list,
>
> my actual task in the process of fitting an lme()-model is to identify and remove redundant predictors before using them as fixed effects.
>
> to get an overview and pick a group of final predictors i use the correlation-coefficients cor() and a pca prcomp()
>
> trying and testing seems an essential way in the process of model fitting, but maybe there is another way/method to get a list of predictors in a more structured way like: this are the top 5 predictors with the fewest correlation, or something else
>
> thanks
> CH
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list