[BioC] topTable threshold on p-value and logFC [Re: was design matrix]

Martin Morgan mtmorgan at fhcrc.org
Mon Oct 8 03:31:51 CEST 2007


Hi Marcus -- A few comments below, for what it's worth...

Marcus Davy <mdavy at hortresearch.co.nz> writes:

> Additionally to decideTests(), I made a function which is useful for making
> *any* filter you like. The example provided filters the same as
> decideTests().
> You must correctly specify the columns of interest in the ''filter''
> expression argument so some knowledge of limma's data structures is
> required.
>
>  ttFilter <- 
> function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t",
>             number = nrow(fit), ...)
> {
>   tt <- topTable(fit, sort.by = sort.by, number = number, ...)

from here...

>   tCols <- colnames(tt)
>   e <- new.env()
>   for (i in tCols) {
>     e[[i]] <- tt[[i]]
>   }
>   toSub <- eval(parse(text = filter), envir = e)

... to here copies the data frame returned by topTable into an
environment, to be used in eval. However, the 'envir' argument to eval
can be a data.frame (!, see the help page for eval), so you could have
just

  toSub <- eval(parse(text=filter), tt)

'with' provides a kind of user-friendly access to this for interactive use

  toSub <- with(tt, abs(logFC) > 0.69 & abs(t) > 2)

>   if( any(is.na(toSub)) ){
>     toSub <- toSub[!is.na(toSub)]
>   }
>   return(tt[toSub, ])

reducing the length of toSub (by deleting the NA's) will likely lead
to unexpected recycling of the subscript index, e.g.,

> df <- data.frame(x=1:3)
> df[c(TRUE,FALSE),, drop=FALSE]
  x
1 1
3 3

Martin

> }
>
> Some Examples;
>      library(limma)
>      set.seed(1)
>      MA <- matrix(rnorm(100, 0,3), nc=4)
>      fit <- lmFit(MA)
>      fit <- eBayes(fit)
>      topTable(fit)
>      # Post filter on |M|>2
>      ttFilter(filter = "abs(logFC)>2", fit)
>      # |M|>1.4 & abs(t) > 1.8
>      ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit)
>
>
> Marcus
>
> On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at wehi.edu.au> wrote:
>
>> I have changed the subject line to something more appropriate.
>> 
>> In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you
>> want (select DE genes on the basis of a combination of p-value and
>> log fold change) was to use decideTests(). In R 2.6.0 and
>> Bioconductor 2.1, you will find that topTable() now has p-value and
>> logFC arguments which allow you to do the same thing using topTable().
>> 
>> Best wishes
>> Gordon
>> 
>>> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST)
>>> From: Lev Soinov <lev_embl1 at yahoo.co.uk>
>>> Subject: Re: [BioC] design matrix
>>> To: "James W. MacDonald" <jmacdon at med.umich.edu>
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com>
>>> Content-Type: text/plain
>>> 
>>> Dear List,
>>> 
>>>   Could you help me with another small issue?
>>>   I usually write out the results of my analysis using the
>>> write.table function as follows:
>>> 
>>>   Assuming 30000 probes in the dataset:
>>>   data <- ReadAffy()
>>>   eset <- rma(data)
>>> 
>>>   design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3)))
>>>   colnames(design) <- c("group1", "group2", "group3")
>>>   contrast.matrix <- makeContrasts(group2-group1, group3-group2,
>>> group3-group1, levels=design)
>>> 
>>>   fit <- lmFit(temp, design)
>>>   fit2 <- contrasts.fit(fit, contrast.matrix)
>>>   fit2 <- eBayes(fit2)
>>> 
>>>   C1<-topTable(fit2, coef=1, number=30000, adjust="BH")
>>> 
>>> write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep="\t",row.na
>>> mes=TRUE,col.names=FALSE)
>>> 
>>>   C2<-topTable(fit2, coef=2, number=30000, adjust="BH")
>>> 
>>> write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep="\t",row.na
>>> mes=TRUE,col.names=FALSE)
>>> 
>>>   C3<-topTable(fit2, coef=3, number=30000, adjust="BH")
>>> 
>>> write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep="\t",row.na
>>> mes=TRUE,col.names=FALSE)
>>> 
>>>   I then use the written out txt files (comparison1.txt,
>>> comparison2.txt and comparison3.txt) to select significant probes
>>> on the basis of log2fold change and adjusted p values thresholds, using
>>> Excel.
>>>   Would you say that this is a correct way to do this and could you
>>> please recommend me some other, may be more efficient way of
>>> writing the results of topTable for all 30000 probes out?
>>> 
>>>   With kind regards,
>>>   Lev.
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> ___________________________________________________________________
> The contents of this e-mail are privileged and/or confid...{{dropped:6}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (208) 667-2793



More information about the Bioconductor mailing list