[BioC] topTable threshold on p-value and logFC [Re: was design matrix]

Mon Oct 8 01:05:11 CEST 2007

Additionally to decideTests(), I made a function which is useful for making
*any* filter you like. The example provided filters the same as
decideTests().
You must correctly specify the columns of interest in the ''filter''
expression argument so some knowledge of limma's data structures is
required.

 ttFilter <- 
function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t",
            number = nrow(fit), ...)
{
  tt <- topTable(fit, sort.by = sort.by, number = number, ...)
  tCols <- colnames(tt)
  e <- new.env()
  for (i in tCols) {
    e[[i]] <- tt[[i]]
  }
  toSub <- eval(parse(text = filter), envir = e)
  if( any(is.na(toSub)) ){
    toSub <- toSub[!is.na(toSub)]
  }
  return(tt[toSub, ])
}

Some Examples;
     library(limma)
     set.seed(1)
     MA <- matrix(rnorm(100, 0,3), nc=4)
     fit <- lmFit(MA)
     fit <- eBayes(fit)
     topTable(fit)
     # Post filter on |M|>2
     ttFilter(filter = "abs(logFC)>2", fit)
     # |M|>1.4 & abs(t) > 1.8
     ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit)

Marcus

On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at wehi.edu.au> wrote:

> I have changed the subject line to something more appropriate.
> 
> In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you
> want (select DE genes on the basis of a combination of p-value and
> log fold change) was to use decideTests(). In R 2.6.0 and
> Bioconductor 2.1, you will find that topTable() now has p-value and
> logFC arguments which allow you to do the same thing using topTable().
> 
> Best wishes
> Gordon
> 
>> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST)
>> From: Lev Soinov <lev_embl1 at yahoo.co.uk>
>> Subject: Re: [BioC] design matrix
>> To: "James W. MacDonald" <jmacdon at med.umich.edu>
>> Cc: bioconductor at stat.math.ethz.ch
>> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com>
>> Content-Type: text/plain
>> 
>> Dear List,
>> 
>>   Could you help me with another small issue?
>>   I usually write out the results of my analysis using the
>> write.table function as follows:
>> 
>>   Assuming 30000 probes in the dataset:
>>   data <- ReadAffy()
>>   eset <- rma(data)
>> 
>>   design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3)))
>>   colnames(design) <- c("group1", "group2", "group3")
>>   contrast.matrix <- makeContrasts(group2-group1, group3-group2,
>> group3-group1, levels=design)
>> 
>>   fit <- lmFit(temp, design)
>>   fit2 <- contrasts.fit(fit, contrast.matrix)
>>   fit2 <- eBayes(fit2)
>> 
>>   C1<-topTable(fit2, coef=1, number=30000, adjust="BH")
>> 
>> write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep="\t",row.na
>> mes=TRUE,col.names=FALSE)
>> 
>>   C2<-topTable(fit2, coef=2, number=30000, adjust="BH")
>> 
>> write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep="\t",row.na
>> mes=TRUE,col.names=FALSE)
>> 
>>   C3<-topTable(fit2, coef=3, number=30000, adjust="BH")
>> 
>> write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep="\t",row.na
>> mes=TRUE,col.names=FALSE)
>> 
>>   I then use the written out txt files (comparison1.txt,
>> comparison2.txt and comparison3.txt) to select significant probes
>> on the basis of log2fold change and adjusted p values thresholds, using
>> Excel.
>>   Would you say that this is a correct way to do this and could you
>> please recommend me some other, may be more efficient way of
>> writing the results of topTable for all 30000 probes out?
>> 
>>   With kind regards,
>>   Lev.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

___________________________________________________________________
The contents of this e-mail are privileged and/or confid...{{dropped:6}}