[R] PCA with not non-negative definite covariance

Quin Wills quin.wills at googlemail.com
Wed Jul 26 23:20:15 CEST 2006

My apologies (in response to the last 2 replies). I should write sensibly -
including subject titles that make grammatical sense.

(1) By analogous, I mean that using classical MDS with Euclidian distance is
equivalent to plotting the first "k" principle components.
(2) Agreed re. distribution assumptions.
(3) Agreed re. the need to use some kind of imputation for calculating
distances. I'm thinking pairwise exclusion for correlation.

Re. why I want to do this is simply for graphically representing my data.


-----Original Message-----
From: Berton Gunter [mailto:gunter.berton at gene.com] 
Sent: 26 July 2006 05:10 PM
To: 'Quin Wills'; bady at univ-lyon1.fr
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] PCA with not non-negative definite covariance

Not sure what "completely analagous" means; mds is nonlinear, PCA is linear.

In any case, the bottom line is that if you have high dimensional data with
"many" missing values, you cannot know what the multivariate distribution
looks like -- and you need a **lot** of data with many variables to usefully
characterize it anyway. So you must either make some assumptions about what
the distribution could be (including imputation methodology) or use any of
the many exploratory techniques available to learn what you can.
Thermodynamics holds -- you can't get something for nothing (you can't fool
Mother Nature).

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Quin Wills
> Sent: Wednesday, July 26, 2006 8:44 AM
> To: bady at univ-lyon1.fr
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] PCA with not non-negative definite covariance
> Thanks.
> I suppose that another option could be just to use classical
> multi-dimensional scaling. By my understanding this is (if based on
> Euclidian measure) completely analogous to PCA, and because it's based
> explicitly on distances, I could easily exclude the variables 
> with NA's on a
> pairwise basis when calculating the distances.
> Quin
> -----Original Message-----
> From: bady at univ-lyon1.fr [mailto:bady at univ-lyon1.fr] 
> Sent: 25 July 2006 09:24 AM
> To: Quin Wills
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] PCA with not non-negative definite covariance
> Hi , hi all,
> > Am I correct to understand from the previous discussions on 
> this topic (a
> > few years back) that if I have a matrix with missing values 
> my PCA options
> > seem dismal if:
> > (1)     I don’t want to impute the missing values.
> > (2)     I don’t want to completely remove cases with missing values.
> > (3)     I do cov() with use=”pairwise.complete.obs”, as 
> this produces
> > negative eigenvalues (which it has in my case!).
> (4) Maybe you can use the Non-linear Iterative Partial Least Squares
> algorithm (intensively used in chemometry). S. Dray proposes 
> a version of
> this
> procedure at http://pbil.univ-lyon1.fr/R/additifs.html.
> Hope this help :)
> Pierre
> --------------------------------------------------------------
> ------------
> Ce message a été envoyé depuis le webmail IMP (Internet 
> Messaging Program)
> -- 
> No virus found in this incoming message.
> --
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

No virus found in this incoming message.



More information about the R-help mailing list