[BioC] topTable threshold on p-value and logFC [Re: was design matrix]
Marcus Davy
mdavy at hortresearch.co.nz
Mon Oct 8 04:14:32 CEST 2007
Condensed function for filtering based on Martins suggestions;
ttFilter <-
function( filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by="t",
number = nrow(fit), ...) {
tt <- topTable(fit, sort.by = sort.by, number = number, ...)
# Obtain logical from filter
toSub <- eval(parse(text=filter), tt)
return( tt[toSub, ] )
}
Marcus
On 8/10/07 2:51 PM, "Marcus Davy" <mdavy at hortresearch.co.nz> wrote:
> Thanks for the information.
>
> Yes you are correct, the code;
>
>>> if( any(is.na(toSub)) ){
>>> toSub <- toSub[!is.na(toSub)]
>>> }
>>> return(tt[toSub, ])
>
> is a bug and needs to be removed because of recycling it will stuff the
> returned index up. That was a quick hack I added recently without thinking
> about enough which is incorrect as your have pointed out.
>
>
> Marcus
>
>
>
> On 8/10/07 2:31 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
>
>> Hi Marcus -- A few comments below, for what it's worth...
>>
>> Marcus Davy <mdavy at hortresearch.co.nz> writes:
>>
>>> Additionally to decideTests(), I made a function which is useful for making
>>> *any* filter you like. The example provided filters the same as
>>> decideTests().
>>> You must correctly specify the columns of interest in the ''filter''
>>> expression argument so some knowledge of limma's data structures is
>>> required.
>>>
>>> ttFilter <-
>>> function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t",
>>> number = nrow(fit), ...)
>>> {
>>> tt <- topTable(fit, sort.by = sort.by, number = number, ...)
>>
>> from here...
>>
>>> tCols <- colnames(tt)
>>> e <- new.env()
>>> for (i in tCols) {
>>> e[[i]] <- tt[[i]]
>>> }
>>> toSub <- eval(parse(text = filter), envir = e)
>>
>> ... to here copies the data frame returned by topTable into an
>> environment, to be used in eval. However, the 'envir' argument to eval
>> can be a data.frame (!, see the help page for eval), so you could have
>> just
>>
>> toSub <- eval(parse(text=filter), tt)
>>
>> 'with' provides a kind of user-friendly access to this for interactive use
>>
>> toSub <- with(tt, abs(logFC) > 0.69 & abs(t) > 2)
>>
>>> if( any(is.na(toSub)) ){
>>> toSub <- toSub[!is.na(toSub)]
>>> }
>>> return(tt[toSub, ])
>>
>> reducing the length of toSub (by deleting the NA's) will likely lead
>> to unexpected recycling of the subscript index, e.g.,
>>
>>> df <- data.frame(x=1:3)
>>> df[c(TRUE,FALSE),, drop=FALSE]
>> x
>> 1 1
>> 3 3
>>
>> Martin
>>
>>> }
>>>
>>> Some Examples;
>>> library(limma)
>>> set.seed(1)
>>> MA <- matrix(rnorm(100, 0,3), nc=4)
>>> fit <- lmFit(MA)
>>> fit <- eBayes(fit)
>>> topTable(fit)
>>> # Post filter on |M|>2
>>> ttFilter(filter = "abs(logFC)>2", fit)
>>> # |M|>1.4 & abs(t) > 1.8
>>> ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit)
>>>
>>>
>>> Marcus
>>>
>>> On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at wehi.edu.au> wrote:
>>>
>>>> I have changed the subject line to something more appropriate.
>>>>
>>>> In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you
>>>> want (select DE genes on the basis of a combination of p-value and
>>>> log fold change) was to use decideTests(). In R 2.6.0 and
>>>> Bioconductor 2.1, you will find that topTable() now has p-value and
>>>> logFC arguments which allow you to do the same thing using topTable().
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST)
>>>>> From: Lev Soinov <lev_embl1 at yahoo.co.uk>
>>>>> Subject: Re: [BioC] design matrix
>>>>> To: "James W. MacDonald" <jmacdon at med.umich.edu>
>>>>> Cc: bioconductor at stat.math.ethz.ch
>>>>> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com>
>>>>> Content-Type: text/plain
>>>>>
>>>>> Dear List,
>>>>>
>>>>> Could you help me with another small issue?
>>>>> I usually write out the results of my analysis using the
>>>>> write.table function as follows:
>>>>>
>>>>> Assuming 30000 probes in the dataset:
>>>>> data <- ReadAffy()
>>>>> eset <- rma(data)
>>>>>
>>>>> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3)))
>>>>> colnames(design) <- c("group1", "group2", "group3")
>>>>> contrast.matrix <- makeContrasts(group2-group1, group3-group2,
>>>>> group3-group1, levels=design)
>>>>>
>>>>> fit <- lmFit(temp, design)
>>>>> fit2 <- contrasts.fit(fit, contrast.matrix)
>>>>> fit2 <- eBayes(fit2)
>>>>>
>>>>> C1<-topTable(fit2, coef=1, number=30000, adjust="BH")
>>>>>
>>>>>
write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep="\t",row>>>>>
.
>>>>> na
>>>>> mes=TRUE,col.names=FALSE)
>>>>>
>>>>> C2<-topTable(fit2, coef=2, number=30000, adjust="BH")
>>>>>
>>>>>
write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep="\t",row>>>>>
.
>>>>> na
>>>>> mes=TRUE,col.names=FALSE)
>>>>>
>>>>> C3<-topTable(fit2, coef=3, number=30000, adjust="BH")
>>>>>
>>>>>
write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep="\t",row>>>>>
.
>>>>> na
>>>>> mes=TRUE,col.names=FALSE)
>>>>>
>>>>> I then use the written out txt files (comparison1.txt,
>>>>> comparison2.txt and comparison3.txt) to select significant probes
>>>>> on the basis of log2fold change and adjusted p values thresholds, using
>>>>> Excel.
>>>>> Would you say that this is a correct way to do this and could you
>>>>> please recommend me some other, may be more efficient way of
>>>>> writing the results of topTable for all 30000 probes out?
>>>>>
>>>>> With kind regards,
>>>>> Lev.
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> ___________________________________________________________________
>>> The contents of this e-mail are privileged and/or confid...{{dropped:6}}
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> ___________________________________________________________________
> The contents of this e-mail are privileged and/or confid...{{dropped:6}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
___________________________________________________________________
The contents of this e-mail are privileged and/or confid...{{dropped:6}}
More information about the Bioconductor
mailing list