[R] Principal Component Analysis - Selecting components? + right choice?

Corrado ct529 at york.ac.uk
Thu Dec 11 12:46:37 CET 2008


Dear R gurus,

I have some climatic data for a region of the world. They are monthly averages 
1950 -2000 of precipitation (12 months), minimum temperature (12 months), 
maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and 
I have around 75,000 cells.

I need to feed them into a statistical model as co-variates, to use them to 
predict a response variable.

The climatic data are obviously correlated: precipitation for January is 
correlated to precipitation for February and so on .... even precipitation 
and temperature are heavily correlated. I did some correlation analysis and 
they are all strongly correlated.

I though of running PCA on them, in order to reduce the number of co-variates 
I feed into the model.

I run the PCA using prcomp, quite successfully. Now I need to use a criteria 
to select the right number of PC. (that is: is it 1,2,3,4?)

What criteria would you suggest?

At the moment, I am using a criteria based on threshold, but that is highly 
subjective, even if there are some rules of thumb (Jolliffe,Principal 
Component Analysis, II Edition, Springer Verlag,2002). 

Could you suggest something more rigorous?

By the way, do you think I would have been better off by using something 
different from PCA?

Best,
-- 
Corrado Topi

Global Climate Change & Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct529 at york.ac.uk



More information about the R-help mailing list