[R] Problems with data structure when using plsr() from package pls

CG Pettersson cg.pettersson at lantmannen.com
Thu Jan 14 11:33:51 CET 2016

Dear Jeff, 
thanks for the effort, but the use of I() when preparing the dataset is suggested by the authors (Mevik & Wehrens, section 3.2):

+If Z is a matrix, it has to be protected by the ‘protect function’ I() in calls
+to data.frame: mydata <- data.frame(..., Z = I(Z)). Otherwise, it will be split into
+separate variables for each column, and there will be no variable called Z in the data frame,
+so we cannot use Z in the formula. One can also add the matrix to an existing data frame:
+R> mydata <- data.frame(...)
+R> mydata$Z <- Z

In the dataset "gasoline" that is supplied with the pls package, there are two variables; octane and NIR, where NIR is a frame with 401 columns and possible to work with like: 
 plsr(octane ~NIR, data = gasoline)
I thought "gasoline" was made like the example above, but I must be missing something else.

Whatever I do ends with " invalid type (list) for variable 'n96'"

So I am still stuck

Från: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] 
Skickat: den 14 januari 2016 05:16
Till: CG Pettersson; r-help at r-project.org
Ämne: Re: [R] Problems with data structure when using plsr() from package pls

Using I() in the data.frame seems ill-advised to me. You complain about 96 variables but from reading your explanation that seems to be what your data are. I have no idea whether it makes sense to NOT have 96 variables if that is what your data are. Note that a reproducible example supplied by you might help us guess better, but it might just be that your expectations are wrong. 
Sent from my phone. Please excuse my brevity.
On January 13, 2016 11:02:25 AM PST, CG Pettersson <cg.pettersson at lantmannen.com> wrote:
R version 3.2.3, W7 64bit.

Dear all!

I am trying to make pls-regression using plsr() from package pls, with Mevik & Wehrens (2007) as tutorial and the datasets from the package.
Everything works real nice as long as I use the supplied datasets, but I don�t understand how to prepare my own data.
This is what I have done:
 frame1 <- data.frame(gushVM, I(n96))

Where gushVM is a vector with fifteen reference analysis values of a quality problem in grain and n96 is a matrix with fifteen rows and 96 columns from an electronic nose. I try to copy the methods as in 3.2 in Mevik & Wehrens, and want to keep n96 as one variable to avoid addressing 96 different variables in the plsr call. If I don�t use I() in the call I get 96 variables instead.
Looking at the data
frame by
summary(frame1) get a return quite like summary(gasoline) from the package (not shown here).
But when I try to use plsr() with my own data it doesn�t work due to an error in the data structure:
 pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in model.frame.default(formula = gushVM ~ n96, data = frame1) :
  invalid type (list) for variable 'n96'

So, n96 has turned into a list, and that is a problem. If gushVM is a vector (one variable) och a matrix (five variables) does not seem to change anything, managing n96 is the problem
I have tried all alternative ways of creating a proper data frame suggested in the article with exactly the same result.
I have tried the docum
for data.frame() but I probably don�t understand what it says.

What should I do to change "n96" into something better than "list"?


Med v�nlig h�lsning/Best regards
CG Pettersson
Scientific Project Manager, PhD
Lantm�nnen Corporate R&D
Phone:  +46 10 556 19 85
Mobile: + 46 70 330 66 85
Email: cg.pettersson at lantmannen.com<mailto:cg.pettersson at lantmannen.com>
Visiting Address: S:t G�ransgatan 160 A
Address: Box 30192, SE-104 25 Stockholm
Webb: http://www.lantmannen.com<http://www.lantmannen.com/>
Registered Office: Stockholm
Before printing, think about the environment

 [[alternative HTML version deleted]]

R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list