[BioC] Inherits(x,"data.frame") error in SamR

Fri Feb 17 01:51:15 CET 2006

Dear Jim,

Thanks very much.  With a slight modification, your code worked.  The modification was this:

Instead of:
Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), 
genenames = geneNames(gs.rma), geneid = 1:14010)
mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20)

I needed to use
Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), 
genenames = geneNames(gs.rma), geneid = 1:14010, 
logged2=TRUE)
mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20)

Sam returns the following error without the "logged2" part in the data statement:
perm= 1
Error in if (logged2) { : argument is of length zero

I thought I was following the example given in the man pages perfectly, but, evidently, I wasn't.  The part about y being an n-vector got away from me.  And I got the variable name "data" from the example, by the way.  Thanks for the tip on masking R functions.

Best & Thanks again,
Monnie

Monnie McGee, Ph.D.
Assistant Professor
Department of Statistical Science
Southern Methodist University
Ph: 214-768-2462
Fax: 214-768-4035

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
Sent: Thu 2/16/2006 1:13 PM
To: McGee, Monnie
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Inherits(x,"data.frame") error in SamR

Hi Monnie,

McGee, Monnie wrote:
> Dear Group,
> 
> I trying to use samr.  I have read a previous post about the ease of
> use of siggenes vs. samr.  It is so true!   I used siggenes
> originally, but that doesn't help me with the problem I am having.  I
> still need to use samr because I want to assess sample size using
> sam.assess.samplesize.  To assess sample size using sam, I need to
> supply sam.assess.samplesize with a "data" vector.  I can't
> understand how to form this vector - perhaps a manual would help, but
> the manual is not on the SAM website as the R-help files claim.    I
> am using a PowerMac G5 with R Version 2.2.1  (2005-12-20 r36812)
> installed.
> 
> I would like to use samr to assess the sample size requirements for
> an experiment I am planning.   I have some training data, which is
> the drosophila spike-in experiment data given in Choe, S. E.,
> Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005).
> Preferred analysis methods for Affymetrix GeneChips revealed by a
> wholly defined control dataset. Genome Biology, 6, R16.
> 
> Here is what I have done: gsbatch = ReadAffy() # the experiment
> consists of 3 technical replicates from "control" chips # and 3
> technical replicates from Spike-in chips on th DrosGenome1 chip
> 
>> gs.rma = rma(gsbatch) # get expression values
> 
> ## get the exprSet into a format that samr can manage:
> 
>> gs.rma.fr = as.data.frame.exprSet(gs.rma) gs.mat =
>> matrix(gs.rma.fr$exprs,nrow=14010,ncol=6) gs.mat.con = gs.mat[,1:3]
>>  gs.mat.si = gs.mat[,4:6] gs.mat.sam = rbind(gs.mat.con,gs.mat.si)
>> ## this is a matrix with dim 28020 by 3, control arrays on top,
>> spike-ins on bottom
> 
> ## grouping vector
> 
>> y = c(rep(1,14010),rep(2,14010)) geneid =
>> as.character(1:nrow(gs.mat.sam)) genenames =
>> gs.rma.fr$genenames[1:14010] data = list(x=gs.mat.sam, y =y ,
>> geneid = geneid, genenames = rep(genenames,2),logged2=TRUE) 
>> samr(data,resp.type="Two class unpaired",nperms=20)
> 
> Error in inherits(x, "data.frame") : (subscript) logical subscript
> too long I also tried deleting the geneid & genenames vectors from
> the "data" list, but still received the same error.
> 
> I can't figure this out.  I am sure the problem is in the way that I
> defined the "data" list, but, without a manual, I really don't
> understand what I did wrong.

Ah, but there is a manual, or at least there are man pages! Terseness is 
the norm, so you have to be very careful when you read what is written, 
as the devil is often in the details.

You have length(y) == 28020, whereas the man page for samr says:

data: Data object with components x- p by n matrix of features, one
           observation per column (missing values allowed); y- n-vector
           of outcome measurements; censoring.status- n-vector of
           censoring censoring.status (1= died or event occurred,
           0=survived, or event was censored), needed for a censored
           survival outcome

Note that the y vector is supposed to be an n-vector of outcome 
measurements whereas you have a p-vector of outcome measurements.

Also note that 'x' is supposed to be a matrix, whereas you likely have a 
data.frame. Sometimes the term matrix is used to mean 'any rectangular 
arrangement of data', but a data.frame is in fact a list, so if the 
author really means matrix and doesn't have any error checking to coerce 
a data.frame to a matrix, an error may occur as well.

I find the examples are often more enlightening than Arguments section, 
so they are always worth reading.

I am confused why you are 'stacking' the data like this. Instead, I 
would think something like this would give you what you want (note that 
'data' is a really bad variable name to use, as you are masking the 
data() function - it is useful to type a possible variable name at an R 
prompt first to see if a function pops up):

Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), 
genenames = geneNames(gs.rma), geneid = 1:14010)
mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20)

HTH,

Jim

> 
> Thank you for your help, Monnie
> 
> Monnie McGee, Ph.D. Assistant Professor Department of Statistical
> Science Southern Methodist University Ph: 214-768-2462 Fax:
> 214-768-4035
> 

-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623