[R] Problems with data structure when using plsr() from package pls

CG Pettersson cg.pettersson at lantmannen.com
Thu Jan 14 11:33:51 CET 2016


Dear Jeff, 
thanks for the effort, but the use of I() when preparing the dataset is suggested by the authors (Mevik & Wehrens, section 3.2):

+If Z is a matrix, it has to be protected by the ‘protect function’ I() in calls
+to data.frame: mydata <- data.frame(..., Z = I(Z)). Otherwise, it will be split into
+separate variables for each column, and there will be no variable called Z in the data frame,
+so we cannot use Z in the formula. One can also add the matrix to an existing data frame:
+R> mydata <- data.frame(...)
+R> mydata$Z <- Z

In the dataset "gasoline" that is supplied with the pls package, there are two variables; octane and NIR, where NIR is a frame with 401 columns and possible to work with like: 
 plsr(octane ~NIR, data = gasoline)
I thought "gasoline" was made like the example above, but I must be missing something else.

Whatever I do ends with " invalid type (list) for variable 'n96'"

So I am still stuck
/CG

Från: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] 
Skickat: den 14 januari 2016 05:16
Till: CG Pettersson; r-help at r-project.org
Ämne: Re: [R] Problems with data structure when using plsr() from package pls

Using I() in the data.frame seems ill-advised to me. You complain about 96 variables but from reading your explanation that seems to be what your data are. I have no idea whether it makes sense to NOT have 96 variables if that is what your data are. Note that a reproducible example supplied by you might help us guess better, but it might just be that your expectations are wrong. 
-- 
Sent from my phone. Please excuse my brevity.
On January 13, 2016 11:02:25 AM PST, CG Pettersson <cg.pettersson at lantmannen.com> wrote:
R version 3.2.3, W7 64bit.

Dear all!

I am trying to make pls-regression using plsr() from package pls, with Mevik & Wehrens (2007) as tutorial and the datasets from the package.
Everything works real nice as long as I use the supplied datasets, but I don�t understand how to prepare my own data.
This is what I have done:
 frame1 <- data.frame(gushVM, I(n96))

Where gushVM is a vector with fifteen reference analysis values of a quality problem in grain and n96 is a matrix with fifteen rows and 96 columns from an electronic nose. I try to copy the methods as in 3.2 in Mevik & Wehrens, and want to keep n96 as one variable to avoid addressing 96 different variables in the plsr call. If I don�t use I() in the call I get 96 variables instead.
Looking at the data
frame by
summary(frame1) get a return quite like summary(gasoline) from the package (not shown here).
But when I try to use plsr() with my own data it doesn�t work due to an error in the data structure:
 pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in model.frame.default(formula = gushVM ~ n96, data = frame1) :
  invalid type (list) for variable 'n96'

So, n96 has turned into a list, and that is a problem. If gushVM is a vector (one variable) och a matrix (five variables) does not seem to change anything, managing n96 is the problem
I have tried all alternative ways of creating a proper data frame suggested in the article with exactly the same result.
I have tried the docum
entation
for data.frame() but I probably don�t understand what it says.

What should I do to change "n96" into something better than "list"?

Thanks
/CG

Med v�nlig h�lsning/Best regards
CG Pettersson
Scientific Project Manager, PhD
______________________
Lantm�nnen Corporate R&D
Phone:  +46 10 556 19 85
Mobile: + 46 70 330 66 85
Email: cg.pettersson at lantmannen.com<mailto:cg.pettersson at lantmannen.com>
Visiting Address: S:t G�ransgatan 160 A
Address: Box 30192, SE-104 25 Stockholm
Webb: http://www.lantmannen.com<http://www.lantmannen.com/>
Registered Office: Stockholm
Before printing, think about the environment


 [[alternative HTML version deleted]]

R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list