[R] question about reproducibility/consistency of principal component and lda directions in R

Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Feb 9 20:43:04 CET 2013



On 08.02.2013 20:14, David Romano wrote:
> Hi everyone,
>
> I'm not exactly sure how to ask this question most clearly, but I hope that
> giving the context in which it occurs for me will help:   I'm trying to
> compare the brain images of two patient populations; each image is composed
> of voxels (the 3D analogue of pixels), and I have two images per patient,
> one reflecting grey matter concentration at each voxel, and the other
> reflecting white matter concentration at each voxel.
>
> I determined the groups by means of an analysis that involved information
> from both types of images, and what I set out to do was to get a rough idea
> of where in the brain the two groups showed the most striking differences.
>
> My first attempt was to replace -- on a voxel by voxel basis -- the
> bivariate grey/white data by a combined univariate measure, namely the
> first principal component score.   From these principal component scores I
> calculated Cohen's d to obtain a rough estimate of the effect size at each
> voxel, and the resulting brain images show very nice separation into
> meaningful brain regions, some corresponding to negative effect sizes and
> some to positive ones.
>
> What puzzles me about how nice the separation into brain regions is, is
> that the meaning of positive and negative is determined by the choice of
> the first principal component direction at each voxel, but this choice is
> -- in principle (no pun intended -- sorry!) -- arbitrary.  (Meaning whether
> an eigenvector or its negative is chosen as the direction is in principle
> arbitrary.)
>
> So here are my questions:   Does the algorithm used in R produce the same
> principal component directions if applied to the same data repeatedly?

Yes, but it may change if you execute it on another machine (depends on 
compiler hence also 32-bit vs 64-bit and OS).


> And if so, should the directions chosen by the algorithm change
> continuously with the data?  For example, if one data set were obtained by
> applying a small amount of noise to another, should the resulting
> directions be close to each other (as opposed to close negative of each
> other)?  (Assuming the data is far from being "singular" in some vague
> sense I'm not sure how to make precise.)

Noise means the sign can change again.

Of course, you can define yourself e.g. the direction of the very first 
value and change signs otherwise.


> My second attempt was to do the same, but with the first lda scores, so I
> have the same questions about lda directions, too.


Same for lda.

Best,
Uwe Ligges

> Any light you could shed on these questions would be very welcome!
>
> Thanks in advance,
> David Romano
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list