[R] Outlier removal by Principal Component Analysis : error message

Claudia Beleites claudia.beleites at ipht-jena.de
Thu May 5 15:01:28 CEST 2011


Dear Boule,

thank you for your interest in hyperSpec.
In order to look into your *problem* I need some more information.

I suggest that we solve the error off-list. Please note also that 
hyperSpec has its own help mailing list:
hyperspec-help at lists.r-forge.r-project.org
(due to the amount of spam I got to moderate, you need to subscribe 
first here: 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/hyperspec-help)

- Which version of hyperSpec do you use? If it is the version from CRAN, 
could you please update to the development version at r-forge with
install.packages("hyperSpec",repos="http://R-Forge.R-project.org")
?

- Next, if the problem persists with the latest build, could you send me 
the raw data file so that I can exactly reproduce your problem?

- Also, for tracking down the exact source of the error, please execute
traceback ()
after you got the error and email me its output.

It is basically impossible to give general recommendations about 
*Outlier detection*: a few spectra that are very different from all 
other spectra may be outliers or they may be the target of a study...
This is also why the example in the vignette uses a two step procedure: 
PCA only identifies suspects, i.e. spectra that have very different 
scores than all others for some principal components. The second step is 
a manually supervised decision whether the spectrum is really an outlier.

The first step could be replaced by other measures that however depend 
on your data. E.g. if you expect/know your data to consist of different 
clusters, suspects could be spectra that are too far away from any 
cluster. If your data comes from a mixture of a few components, spectra 
that cannot be modeled decently by a few PLS components could be 
suspicious. Or spectra that require an own component, ...
Some kinds of outliers are actually well-defined in a spectroscopic 
sense, e.g. contamination by fluorescent lamp light.

The second step could be replaced by an automatic decision, e.g. with a 
distance threshold.
Personally, I rather use the term filtering for such automatic rules. 
And there you can think about any number of rules your spectra must 
comply with in order to be acceptable: signal to noise ratio, minimal 
and maximal intensity, original offset (baseline) less than, ...

Hope that helps,

Claudia


> I am currently analysis Raman spectroscopic data with the hyperSpec package.
> I consulted the documentation on this package and I found an example
> work-flow dedicated to Raman spectroscopy (see the address :
> http://hyperspec.r-forge.r-project.org/chondro.pdf)
>
> I am currently trying to remove outliers thanks to PCA just as they did in
> the documentation, but I get a message error I can't explain. Here is my
> code :
>
> "#import the data :
> T=read.table('bladder bis concatenation colonne.txt',header=TRUE)
> spec=new("hyperSpec",wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength="Raman
> shift (cm-1)",spc="Intensity (a.u.)"))
>
> #baseline correction of the spectra
> spec=spec[,,500~1800]
> bl=spc.fit.poly.below(spec)
> spec=spec-bl
>
> #normalization of the spectra
> spec=sweep(spec,1,apply(spec,1,mean),'/')
>
> #PCA
> pca=prcomp(~ spc,data=spec$.,center=TRUE)
> scores=decomposition(spec,pca$x,label.wavelength="PC",label.spc="score/a.u.")
> loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc="laoding
> I/a.u.")
>
> #plot the scores of the first 20 PC against all other to have an idea where
> to find the outliers
> pairs(scores[[,,1:20]],pch=19,cex=0.5)
>
> #identify the outliers thanks to "map.identify"
> out=map.identify(scores[,,5])
> Erreur dans `[.data.frame`(x at data, , j, drop = FALSE) :
>    undefined columns selected
>
> Does anybody understand where the problem comes from ?
> And does anybody know another mean to find spectra outliers ?
>
> Thank you in advance.
>
> Boule
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399



More information about the R-help mailing list