[R] Help in using PCR

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Jul 1 17:33:07 CEST 2008


On Wed, 2008-07-02 at 00:58 +1000, Jason Lee wrote:
> Hi,
> 
> Thanks for the reply.
> 
> Basically I dont have any label for my data except column 1 which is labeled
> with Sample1, Sample2...etc...(vertical).

Well that is going to cause you problems, not the one you report below,
but it will bite you in the ass once you sort your R usage problems out.
How did you read in your data. If using read.table or it's stable mates
(read.csv etc) then read ?read.table and look at argument row.names to
see how you can get those sample labels as the rownames for the data
frame.

What does names(cancerv1) tell you? Unless you specifically deleted the
column names then there will be some as R will have generated them upon
reading your data in.

> My cancerv1 data is data.frame.
> 
> So, I used df <- data.frame( x=I(coef(cancerv1(,2:407))),
> y=cancerv1[,408])before feeding to PCR.

Why? What are you trying to achieve? Why apply I() and coef() to these?

> 
> However, I get the below error.
> 
> Error in coef(cancerv1(, 2:407)) : could not find function "cancerv1".

cancerv1(, 2:407) - you have the wrong "brackets" there. Those are
parentheses and denote an R function. You want brackets "[" "]" instead.

> 
> I wonder what mistakes did I made in thiscase. My response variable is on
> column 408 and my predictors are from column 2 to 407.

Ok, lets sort this out. [Not tested as I don't have your data]

df <- data.frame(resp = cancerv1[, 408], 
                 VARS = as.matrix(cancerv1[, 2:407])
mod <- pcr(resp ~ VARS , ncomp = 6, data = df, validation = "CV")

Which will get it in a format similar to the yarns example that you
mentioned in your original post.

Now, if you sort out your data import issue (see above), you'll need to
change the numbers in the square brackets above - they'll be 1 less than
I have them up there.

An alternative, along the lines of my response:

df <- cancerv1[, -1]
## add some (col)names
names(df) <- c("resp", paste("Var", 1:(ncol(df)-1), sep = ""))
names(df)
mod2 <- pcr(resp ~ . , ncomps = 6, data = df, validation = "CV")

Note Bjorn-Helge's comment about this latter approach taking a while to
process the formula if you start using this on data sets with many more
than 1000 predictor variables.

Does this help you any?

> 
> Please advise. Thanks.

You do seem to be blundering about with R a bit ;-) Randomly trying
functions and other R code is just going to frustrate you. Do help
yourself and read some of the introductory documentation.

HTH

G

> 
> On Tue, Jul 1, 2008 at 6:41 PM, Bjrn-Helge Mevik <b.h.mevik at usit.uio.no>
> wrote:
> 
> > Gavin Simpson <gavin.simpson at ucl.ac.uk> writes:
> >
> > > You can do this another way though, that I feel is more natural. So lets
> > > assume that your data frame contains columns that are named, and that
> > > one of these is the response variable, the remaining columns are the
> > > predictors. Further assume that this response is called 'myresp', then
> > > you can proceed by the following:
> > >
> > > cancerv1.pcr <- pcr(myresp ~ . , ncomp = 6, data = cancerv1,
> > >                     validation = "CV")
> >
> > This works fine as long as the number of (predictor) variables is not
> > too large.  With many variables (>> 1000), R will spend a very long time
> > dealing with the formula.
> >
> > --
> > Bjrn-Helge Mevik
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list