[R] Measure of Redundancy In Variables
rg117 at yahoo.co.uk
Thu Mar 20 18:46:04 CET 2003
I have a question which I guess is more of a general stats question than a specific R quetions.
I have a data set that contains a large number of numerical variables (in the hundreds). What I
would like to do is quantify the redundancy in those variables. Let me explain what I mean by
If I use Principle Component Analysis (PCA) to reduce the amount of variables, the process
measures the relationship between the different variables and reorganises it so that each variable
provides unique information and removes any redundancy between different variables. What I would
like to do is a kind of measure between the data before PCA and after PCA. For example, if there
is no redundancy, i.e. all of the pre-PCA variables provide unique information, the redundancy
rate would be 100%. On the other hand if all the pre-PCA variables provide the same information
than the redundancy rate would be 1%.
Could anyone tell me if there is a method of measuring this redundancy rate or something similar
If somebody could help me with this issue it would be greatly appreciated. Many Thanks
More information about the R-help