[BioC] LogFC query in Limma

James W. MacDonald jmacdon at uw.edu
Thu Jan 31 22:48:00 CET 2013


If you just use the expression values from the original authors, I get 
just under 9K probesets for this comparison at an FDR of 0.05 and no 
fold change criterion. It drops to just under 900 with a 2-fold 
difference added in.

So yeah, seems like a lot to me as well.

Best,

Jim

On 1/31/2013 4:02 PM, Steve Lianoglou wrote:
> ... what Jim said.
>
> But also, this 20k differentially expressed (likely probe sets, not
> genes) is raising a red flag for me, no? Am I alone here?
>
> That's .. what's the word I'm looking for ... "a lot".
>
> -steve
>
> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon at uw.edu>  wrote:
>> Hi Roopa,
>>
>>
>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote:
>>> Hi Steve,
>>>
>>> This was the script I used-
>>> getwd()
>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>>> ls()
>>> #-----------------------------------------------#
>>> library(affy)
>>> eset = justRMA()
>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
>>>
>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
>>> labels=c("Healthy", "unaffected"))
>>> design<- model.matrix(~ 0 + f)
>>> design
>>> colnames(design)<-c("Healthy","unaffected")
>>> design
>>> library(limma)
>>> fit<- lmFit(eset, design)
>>> library(hgu133plus2.db)
>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"hgu133plus2.db")
>>> contrast.matrix<-makeContrasts(affected-Healthy,levels = design)
>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>> fit2<- eBayes(fit2)
>>> topTable(fit2,coef=1,p=0.05, adjust="fdr")
>>> results<- decideTests(fit2, adjust="fdr", p=0.05)
>>> summary(results)
>>> write.table(results,file="myresults.txt")
>>> write.fit().
>>>
>>> I had identified ~54,000 genes of which ~ 20K were differentially
>>> expressed.
>>>
>>> But when I use these genes for pathway analysis the software asks for fold
>>> change values but not p value so it is easier to analyze the data.
>>>
>>> What I did was - I used the differentially expressed gene table for
>>> further
>>> analysis. That is I converted logFC values to FC(test/control) assuming
>>> that
>>>
>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values.
>>>
>>> Once I got test/control values I converted them to fold changes using "IF"
>>> function in excel sheet to eliminate genes with fold changes between -2 to
>>> +2.
>>>
>>> Once I did this the number of significant genes drastically reduced to ~
>>> 2,000.
>>>
>>> Is this the right method?
>>
>> No. Note that the range of fold changes after 'unlogging' will be 0-INF, and
>> the down-regulated genes will be in the range 0-1 whereas the upregulated
>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas 2
>> fold down will be 1/2 or 0.5).
>>
>> The easiest way to filter is to keep the logFC and filter on -1 and 1. Or
>> you can use the lfc argument to decideTests().
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>> Please advice, thanks, Roopa
>>>
>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou<
>>> mailinglist.honeypot at gmail.com>   wrote:
>>>
>>>> Hi,
>>>>
>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115 at case.edu>   wrote:
>>>>> Hi All,
>>>>>
>>>>> Thanks for the reply, I could pull out the the whole information for
>>>>> differentially expressed genes. The criteria used was adjust="fdr",
>>>> p=0.05.
>>>>> I came up with ~ 20,000 genes to be differentially expressed.
>>>> Hmm ... surely 20k cannot be correct?
>>>>
>>>>> Since I wanted to analyze these genes for deregulated pathways I had to
>>>>> come up with fold change values for further analysis.
>>>>>
>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is
>>>>> log2
>>>>> of FC values.
>>>>>
>>>>> When I convert the FC values (test/blank) to foldchanges using IF
>>>> function
>>>>> I get lesser number of genes to be deregulated. The criteria was =>2
>>>>> foldchanges and =<-2 fold changes.
>>>> I'm missing previous context to this email, so -- not sure what the
>>>> "IF function" is, but if you're using limma, the log2fold changes are
>>>> reported for you in the logFC column that is returned from
>>>> `topTable(...)`
>>>>
>>>> -steve
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>>    | Memorial Sloan-Kettering Cancer Center
>>>>    | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list