<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
Hello R people,
<p>i'm trying to implement the Partial Least Squares algorithm called SAMPLS
from "J.Comp-Aided Molecular Design", 7 (1993), 587-619. It's faster than
the classical PLS algorithm for fat matrix (m>>n).
<p>Here's the algorithm from the article of Bush B. L. and Nachbar R.B.:
<br> X is the matrix of explanatories proprieties (m*n)
, y the matrix of responses, h the number of latent variables extracted
<br> XT is for X matrix transposed
<br> x* is for the quantities for one sample (y* is the
response predicted from the model derived; i used one to test my R traduction
compared to the R pls module )
<br>
<p> Calculate the covariance matrix C=XX<sup>T
</sup>and <sup> </sup>c*=Xx* for prediction
<br><sup> </sup>y is centered and become y<sub>1</sub>
<br> y*<sub>1</sub>=0
<br>
<p><sup> </sup> For h =1,2,3...hmax
<br> s=Cy<sub>h</sub>
<br> center s
<br> working scalar for prediction
sample s*=c*<sup>T</sup>y<sub>h</sub>
<br> orthogonalize s to previous
t: for g=1,...(h-1), s=s-(t<sub>g</sub><sup>T</sup>s/t<sub>g</sub>Tt<sub>g</sub>)t<sub>g</sub>
<br> orthogonalize s* to previous
t*: for g=1,...(h-1), s*=s*-(t<sub>g</sub><sup>T</sup>s/t<sub>g</sub>Tt<sub>g</sub>)t*<sub>g</sub>
<br><sub> </sub>
<br><sub>
</sub>t*<sub>h</sub>=s*<sub></sub>
<br> t<sub>h</sub>=s
<br> t<sub>h</sub><sup>2</sup>=t<sup>T</sup>t
<br> beta<sub>h</sub>=(t<sup>T</sup>y<sub>h</sub>)/t<sub>h</sub><sup>2</sup>
<p> update y<sub>h+1</sub>=y<sub>h</sub>-beta<sub>h</sub>t<sub>h</sub>
<br><sub> </sub>buid
up prediction y*<sub>h+1</sub>=y*<sub>h</sub>+beta<sub>h</sub>t*<sub>h</sub>
<p> end of cycle
<br>----------------------------------- R-code
<br>##xe and ye are the explanatories and responses matrices,
xtest and ytestsampls the variables for 1 sample
<p>x2<-scale(xe,scale=FALSE)
<br>y2<-scale(ye,scale=FALSE)
<p>lv<-1
<br>xtest<-as.matrix(x2[1,])
<br>t<-matrix(0,nrow(ye),1)
<br>c<-xe%*%t(xe)
<br>yh<-y2
<br>ytestsampls<-0
<br>ctest<-xe%*%xtest
<br>
<p>for (h in 1:lv) {
<br> s<-c%*%yh
<br> s<-scale(s,scale=FALSE)
<br>stest<-t(ctest)%*%yh
<p>##what follows works only for h=1 and 2, i know
<p> if (h>1) { s<-s- ( as.numeric( (t(t)%*%s) / (t(t)%*%t)
) *t )
<br> stest<-stest-( as.numeric( (t(t)%*%s) / (t(t)%*%t)
) *ttest )
<br> }
<br>ttest<-stest
<br> t<-s
<br> t2<-t(t)%*%t
<br> beta<-t(t)%*%yh
<br> beta<-as.numeric(beta/t2)
<br>
<br>ytestsampls<-ytestsampls + as.numeric(beta)*(ttest)
<br> yh<-yh-(beta*t)
<br>}
<p>ytestsampls2<-ytestsampls+mean(ye)
<br>
<br>
<p>-------------------
<p>When lv (number of variables extracted ) is 1 , no problem the y predicted
(ytestsampls2) is the same as when using the R module pls (library(pls)).
But when using lv=2, there is a difference , thus an error in my code that
must come from the update steps.
<p>Does it come from the original algorithm or from my traduction.
<p>Merci d'avance,
<p>sorry for the size of this e-mail and thanks for reading it till all,
<p>--
<br>Nicolas Baurin
<p>Doctorant
<br>Institut de Chimie Organique et Analytique, UPRES-A 6005
<br>Université d'Orléans, BP 6759
<br>45067 ORLEANS Cedex 2, France
<br>Tel: (33+) 2 38 49 45 77
<br> </html>