[R] how to tell if its better to standardize your data matrix first when you do principal
Uwe Ligges
ligges at statistik.tu-dortmund.de
Mon Nov 23 10:12:50 CET 2009
masterinex wrote:
> Hi Hadley ,
>
> I really apreciate the suggestions you gave, It was helpful , but I still
> didnt quite get it all. and I really want to do a good job , so any
> comments would sure come helpful, please understand me .
Well, we try to understand you, but we do not either. I think you really
nedc to consult some statistics textbook on PCA if my answer was not
sufficient. Given your questions, I doubt you understand what PCA does
and how it works. It does not predict anything.
Uwe Ligges
> hadley wrote:
>> You've asked the same question on stackoverflow.com and received the
>> same answer. This is rude because it duplicates effort. If you
>> urgently need a response to a question, perhaps you should consider
>> paying for it.
>>
>> Hadley
>>
>> On Sun, Nov 22, 2009 at 12:04 PM, masterinex <xevilgang79 at hotmail.com>
>> wrote:
>>> so under which cases is it better to standardize the data matrix first
>>> ?
>>> also is PCA generally used to predict the response variable , should I
>>> keep that variable in my data matrix ?
>>>
>>>
>>> Uwe Ligges-3 wrote:
>>>> masterinex wrote:
>>>>>
>>>>> Hi guys ,
>>>>>
>>>>> Im trying to do principal component analysis in R . There is 2 ways of
>>>>> doing
>>>>> it , I believe.
>>>>> One is doing principal component analysis right away the other way is
>>>>> standardizing the matrix first using s = scale(m)and then apply
>>>>> principal
>>>>> component analysis.
>>>>> How do I tell what result is better ? What values in particular should
>>>>> i
>>>>> look at . I already managed to find the eigenvalues and eigenvectors ,
>>>>> the
>>>>> proportion of variance for each eigenvector using both methods.
>>>>>
>>>> Generally, it is better to standardize. But in some cases, e.g. for the
>>>> same units in your variables indicating also the importance, it might
>>>> make sense not to do so.
>>>> You should think about the analysis, you cannot know which result is
>>>> `better' unless you know an interpretation.
>>>>
>>>>
>>>>
>>>>> I noticed that the proportion of the variance for the first pca
>>>>> without
>>>>> standardizing had a larger value . Is there a meaning to it ? Isnt
>>>>> this
>>>>> always the case?
>>>>> At last , if I am supposed to predict a variable ie weight should I
>>>>> drop
>>>>> the variable ie weight from my data matrix when I do principal
>>>>> component
>>>>> analysis ?
>>>>
>>>> This sounds a bit like homework. If that is the case, please ask your
>>>> teacher rather than this list.
>>>> Anyway, it does not make sense to predict weight using a linear
>>>> combination (principle component) that contains weight, does it?
>>>>
>>>> Uwe Ligges
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26466400.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> http://had.co.nz/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
More information about the R-help
mailing list