[R] PCA sensitive to outliers?
Martin Maechler
maechler at stat.math.ethz.ch
Mon Apr 23 10:08:33 CEST 2012
>>>>> "SL" == Steve Lianoglou <mailinglist.honeypot at gmail.com>
>>>>> on Mon, 23 Apr 2012 01:10:31 -0400 writes:
SL> On Mon, Apr 23, 2012 at 12:01 AM, Michael
SL> <comtech.usa at gmail.com> wrote:
>> yes, but that is not a good Review or Survey... thx
SL> But the packages listed there do have their own
SL> documentation and vignettes. For instance the rrcov
SL> package seems to have a nice vignette about its design
SL> as well as methods it implements, and references to
SL> these methods for further reading:
SL> http://cran.r-project.org/web/packages/rrcov/vignettes/rrcov.pdf
SL> You'll see at least a few mentions of PCA, which will
SL> lead you to other package/papers/etc.
Yes, indeed, thanks Steve!
Unfortunately, the topic of robust PCA
is not quite trivial, and has been approached (too) many times...
As maintainer of the robust task view, I'd indeed strongly
recommend working with 'rrcov' or 'robustbase' which already
contains an important subset of rrcov's robust covariance matrix
estimators.
Note that the historically earliest robust covariance estimator
available in an R package is cov.rob() from MASS ('Recommended'
package available with every R).
*And* you can use standard R's
princomp(x, ... , covmat = <robust.cov>(x))
to get robust PCA.
I'll add a note with that to the 'Robust' CRAN task view.
Martin Maechler,
ETH Zurich
SL> Enjoy,
SL> -steve
>>
>> On Sun, Apr 22, 2012 at 9:47 PM, Bert Gunter
>> <gunter.berton at gene.com> wrote:
>>
>>> As I believe I already told you, look at the CRAN Robust
>>> task view.
>>>
>>> -- Bert
>>>
>>> On Sun, Apr 22, 2012 at 6:29 PM, Michael
>>> <comtech.usa at gmail.com> wrote: > Even in R, there are so
>>> many of "robust PCA"... any survey or review of all >
>>> these different methods?
>>> >
>>> > On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley
>>> <jwiley.psych at gmail.com >wrote:
>>> >
>>> >> On Sun, Apr 22, 2012 at 4:43 PM, Michael
>>> <comtech.usa at gmail.com> wrote: >> > I actually tried
>>> "robustPca" in "pcaMethods" on bioconductor.
>>> >> >
>>> >> > It keeps giving me the warning "Input data is not
>>> complete"...
>>> >> >
>>> >> > Reading into the function:
>>> >> >
>>> >> > When there is no "NA"s, it will give this
>>> warning...
>>> >> >
>>> >> > It seems that there is a bug in this code...
>>> >> >
>>> >> > Is it reliable at all?
>>> >> >
>>> >> > ---------------------
>>> >> >
>>> >> >
>>> >> >> robustPcafunction (Matrix, nPcs = 2, verbose =
>>> interactive(), ...) >> > { >> > nas <- is.na(Matrix)
>>> >> > if (!any(nas) & verbose) { >> >
>>> cat("Input data is not complete.\n") >> >
>>> cat("Scores, R2 and R2cum may be inaccurate, handle
>>> with care\n") >> > }
>>> >>
>>> >> that seems to issue the notes when there are *not any
>>> missing* and >> verbose is TRUE. I would submit a bug
>>> report to the author.
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright
>>> <kw.stat at gmail.com> wrote:
>>> >> >
>>> >> >> You can also have a look at the pcaMethods package
>>> on Bioconductor.
>>> >> >>
>>> >> >> Kevin
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Apr 19, 2012 at 11:20 PM, Michael
>>> <comtech.usa at gmail.com> >> wrote:
>>> >> >>
>>> >> >>> Hi all,
>>> >> >>>
>>> >> >>> I found that the PCA gave chaotic results when
>>> there are big changes >> in a >> >>> few data points.
>>> >> >>>
>>> >> >>> Are there "improved" versions of PCA in R that
>>> can help with this >> problem?
>>> >> >>>
>>> >> >>> Please give me some pointers...
>>> >> >>>
>>> >> >>> Thank you!
>>> >> >>>
>>> >> >>> [[alternative HTML version deleted]]
>>> >> >>>
>>> >> >>> ______________________________________________ >>
>>> >>> R-help at r-project.org mailing list >> >>>
>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>
>>> PLEASE do read the posting guide >> >>>
>>> http://www.R-project.org/posting-guide.html<
>>> http://www.r-project.org/posting-guide.html> >>
>>> <http://www.r-project.org/posting-guide.html> >> >>>
>>> and provide commented, minimal, self-contained,
>>> reproducible code.
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Kevin Wright
>>> >> >>
>>> >> >>
>>> >> >
>>> >> > [[alternative HTML version deleted]]
>>> >> >
>>> >> > ______________________________________________ >> >
>>> R-help at r-project.org mailing list >> >
>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE
>>> do read the posting guide >>
>>> http://www.R-project.org/posting-guide.html<
>>> http://www.r-project.org/posting-guide.html> >> > and
>>> provide commented, minimal, self-contained, reproducible
>>> code.
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Joshua Wiley >> Ph.D. Student, Health Psychology >>
>>> Programmer Analyst II, Statistical Consulting Group >>
>>> University of California, Los Angeles >>
>>> https://joshuawiley.com/
>>> >>
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________ >
>>> R-help at r-project.org mailing list >
>>> https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the
>>> posting guide http://www.R-project.org/posting-guide.html > and
>>> provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info: Phone: 467-7374 Website:
>>>
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
SL> -- Steve Lianoglou Graduate Student: Computational
SL> Systems Biology | Memorial Sloan-Kettering Cancer
SL> Center | Weill Medical College of Cornell University
SL> Contact Info: http://cbio.mskcc.org/~lianos/contact
SL> ______________________________________________
SL> R-help at r-project.org mailing list
SL> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
SL> read the posting guide
SL> http://www.R-project.org/posting-guide.html and provide
SL> commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list