[R-sig-phylo] pre-transformation for phylo-pca
Ben Bolker
bolker at ufl.edu
Mon Dec 21 23:29:03 CET 2009
Hmm. The zeros are going to remain zero no matter how you transform
them ... remember, it isn't necessary for PCA that the variables
actually be normally distributed -- only for inference on the
significance of the PCA components. (An analogy is that means and
variances of observations are meaningful summary statistics no matter
how weird the distribution is [OK, not counting distributions with
non-finite moments], but the simplest inferential methods depend on the
data being approximately normal.) I would probably just go ahead and
ignore the weird distributions, provided the reduced variables seem to
make some sense. Under other circumstances I would consider dividing
the weird variables into a binary component (zero vs non-zero) and a
conditional distribution, but that won't work in this case because the
conditional part would have NAs for every observation with a zero ...
Dan Rabosky wrote:
> Howdy-
>
> This isn't really an R question, but will involve an R solution....
>
> I have some ecological data (habitat) that I'm analyzing in a
> phylogenetic framework. Lots of variables. Some data reduction is
> obviously necessary. However, some variables have severe zero
> inflation problems - even if the remainder of the distribution is
> very nicely normally distributed (e.g., 50% of observations are zero,
> the other 50% have a nice tractable distribution). Can anyone think
> of any options for dealing with this so it is amenable to PCA?
>
> Thanks,
> ~Dan Rabosky
>
>
>
>
