[BioC] LogFC query in Limma
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Feb 3 01:04:19 CET 2013
Dear Roopa,
Perhaps you have just misread the limma output. This thread has been
mainly concerned with your statement that you found ~20,000 genes to be
differentially expressed. But the limma output shows 984 up-regulated and
1187 down-regulated genes, in other words about 2k total rather than 20k.
Best wishes
Gordon
> Date: Fri, 1 Feb 2013 11:05:36 -0500
> From: Roopa Subbaiaih <rss115 at case.edu>
> To: "James W. MacDonald" <jmacdon at uw.edu>
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] LogFC query in Limma
>
> Hi James,
>
> Am I still making any silly mistake over here. I am comparing normal (21)
> with lesionous patient samples (34). This is what I get in R console. Is
> there a problem with the script?
>
> Any advice would greatly help for further analysis.Thanks, Roopa
>
>> getwd()
> [1] "C:/Documents and Settings/rsubbaiaih/My Documents"
>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>> ls()
> character(0)
>> #-----------------------------------------------#
>> library(affy)
>> eset = justRMA()
> Loading required package: AnnotationDbi
>
>> eset
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 54675 features, 55 samples
> element names: exprs, se.exprs
> protocolData
> sampleNames: GSM372286.CEL GSM372287.CEL ...
> GSM372367.CEL (55 total)
> varLabels: ScanDate
> varMetadata: labelDescription
> phenoData
> sampleNames: GSM372286.CEL GSM372287.CEL ...
> GSM372367.CEL (55 total)
> varLabels: sample
> varMetadata: labelDescription
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation: hgu133plus2
>> f <- factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
> +
> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
> labels=c("Healthy", "affected"))
>> design <- model.matrix(~ 0 + f)
>> design
> fHealthy faffected
> 1 1 0
> 2 1 0
> 3 1 0
> 4 1 0
> 5 1 0
> 6 1 0
> 7 1 0
> 8 1 0
> 9 1 0
> 10 1 0
> 11 1 0
> 12 1 0
> 13 1 0
> 14 1 0
> 15 1 0
> 16 1 0
> 17 1 0
> 18 1 0
> 19 1 0
> 20 1 0
> 21 1 0
> 22 0 1
> 23 0 1
> 24 0 1
> 25 0 1
> 26 0 1
> 27 0 1
> 28 0 1
> 29 0 1
> 30 0 1
> 31 0 1
> 32 0 1
> 33 0 1
> 34 0 1
> 35 0 1
> 36 0 1
> 37 0 1
> 38 0 1
> 39 0 1
> 40 0 1
> 41 0 1
> 42 0 1
> 43 0 1
> 44 0 1
> 45 0 1
> 46 0 1
> 47 0 1
> 48 0 1
> 49 0 1
> 50 0 1
> 51 0 1
> 52 0 1
> 53 0 1
> 54 0 1
> 55 0 1
> attr(,"assign")
> [1] 1 1
> attr(,"contrasts")
> attr(,"contrasts")$f
> [1] "contr.treatment"
>
>> colnames(design) <-c("Healthy","affected")
>> library(limma)
>> fit <- lmFit(eset, design)
>> fit
> An object of class "MArrayLM"
> $coefficients
> Healthy affected
> 1007_s_at 10.081309 9.548286
> 1053_at 6.807501 7.482849
> 117_at 5.969921 6.147594
> 121_at 7.403842 7.733666
> 1255_g_at 2.804475 2.827041
> 54670 more rows ...
>
> $rank
> [1] 2
>
> $assign
> [1] 1 1
>
> $qr
> $qr
> Healthy affected
> 1 -4.5825757 0.000000
> 2 0.2182179 -5.830952
> 3 0.2182179 0.000000
> 4 0.2182179 0.000000
> 5 0.2182179 0.000000
> 50 more rows ...
>
> $qraux
> [1] 1.218218 1.000000
>
> $pivot
> [1] 1 2
>
> $tol
> [1] 1e-07
>
> $rank
> [1] 2
>
>
> $df.residual
> [1] 53 53 53 53 53
> 54670 more elements ...
>
> $sigma
> 1007_s_at 1053_at 117_at 121_at 1255_g_at
> 0.2866489 0.4801618 0.3499880 0.2332555 0.1053397
> 54670 more elements ...
>
> $cov.coefficients
> Healthy affected
> Healthy 0.04761905 0.00000000
> affected 0.00000000 0.02941176
>
> $stdev.unscaled
> Healthy affected
> 1007_s_at 0.2182179 0.1714986
> 1053_at 0.2182179 0.1714986
> 117_at 0.2182179 0.1714986
> 121_at 0.2182179 0.1714986
> 1255_g_at 0.2182179 0.1714986
> 54670 more rows ...
>
> $pivot
> [1] 1 2
>
> $genes
> [1] "1007_s_at" "1053_at" "117_at" "121_at"
> [5] "1255_g_at"
> 54670 more rows ...
>
> $Amean
> 1007_s_at 1053_at 117_at 121_at 1255_g_at
> 9.751804 7.224988 6.079756 7.607733 2.818425
> 54670 more elements ...
>
> $method
> [1] "ls"
>
> $design
> Healthy affected
> 1 1 0
> 2 1 0
> 3 1 0
> 4 1 0
> 5 1 0
> 50 more rows ...
>
>> contrast.matrix <-makeContrasts(affected-Healthy,levels = design)
>> fit2 <- contrasts.fit(fit, contrast.matrix)
>> fit2 <- eBayes(fit2)
>> results <- decideTests(fit2, adjust="fdr",lfc=1)
>> summary(results)
> affected - Healthy
> -1 1187
> 0 52504
> 1 984
>> results
> TestResults matrix
> 1007_s_at 1053_at 117_at 121_at 1255_g_at
> 0 0 0 0 0
> 54670 more rows ...
>>
>
>
> On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> If you just use the expression values from the original authors, I get
>> just under 9K probesets for this comparison at an FDR of 0.05 and no fold
>> change criterion. It drops to just under 900 with a 2-fold difference added
>> in.
>>
>> So yeah, seems like a lot to me as well.
>>
>> Best,
>>
>> Jim
>>
>>
>> On 1/31/2013 4:02 PM, Steve Lianoglou wrote:
>>
>>> ... what Jim said.
>>>
>>> But also, this 20k differentially expressed (likely probe sets, not
>>> genes) is raising a red flag for me, no? Am I alone here?
>>>
>>> That's .. what's the word I'm looking for ... "a lot".
>>>
>>> -steve
>>>
>>> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon at uw.edu>
>>> wrote:
>>>
>>>> Hi Roopa,
>>>>
>>>>
>>>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote:
>>>>
>>>>> Hi Steve,
>>>>>
>>>>> This was the script I used-
>>>>> getwd()
>>>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>>>>> ls()
>>>>> #-----------------------------**------------------#
>>>>> library(affy)
>>>>> eset = justRMA()
>>>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1,
>>>>>
>>>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**
>>>>> 2,2,2,2),
>>>>> labels=c("Healthy", "unaffected"))
>>>>> design<- model.matrix(~ 0 + f)
>>>>> design
>>>>> colnames(design)<-c("Healthy",**"unaffected")
>>>>> design
>>>>> library(limma)
>>>>> fit<- lmFit(eset, design)
>>>>> library(hgu133plus2.db)
>>>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"**hgu133plus2.db")
>>>>> contrast.matrix<-**makeContrasts(affected-**Healthy,levels = design)
>>>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>>>> fit2<- eBayes(fit2)
>>>>> topTable(fit2,coef=1,p=0.05, adjust="fdr")
>>>>> results<- decideTests(fit2, adjust="fdr", p=0.05)
>>>>> summary(results)
>>>>> write.table(results,file="**myresults.txt")
>>>>> write.fit().
>>>>>
>>>>> I had identified ~54,000 genes of which ~ 20K were differentially
>>>>> expressed.
>>>>>
>>>>> But when I use these genes for pathway analysis the software asks for
>>>>> fold
>>>>> change values but not p value so it is easier to analyze the data.
>>>>>
>>>>> What I did was - I used the differentially expressed gene table for
>>>>> further
>>>>> analysis. That is I converted logFC values to FC(test/control) assuming
>>>>> that
>>>>>
>>>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values.
>>>>>
>>>>> Once I got test/control values I converted them to fold changes using
>>>>> "IF"
>>>>> function in excel sheet to eliminate genes with fold changes between -2
>>>>> to
>>>>> +2.
>>>>>
>>>>> Once I did this the number of significant genes drastically reduced to ~
>>>>> 2,000.
>>>>>
>>>>> Is this the right method?
>>>>>
>>>>
>>>> No. Note that the range of fold changes after 'unlogging' will be 0-INF,
>>>> and
>>>> the down-regulated genes will be in the range 0-1 whereas the upregulated
>>>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas 2
>>>> fold down will be 1/2 or 0.5).
>>>>
>>>> The easiest way to filter is to keep the logFC and filter on -1 and 1. Or
>>>> you can use the lfc argument to decideTests().
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>> Please advice, thanks, Roopa
>>>>>
>>>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou<
>>>>> mailinglist.honeypot at gmail.com**> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115 at case.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Thanks for the reply, I could pull out the the whole information for
>>>>>>> differentially expressed genes. The criteria used was adjust="fdr",
>>>>>>>
>>>>>> p=0.05.
>>>>>>
>>>>>>> I came up with ~ 20,000 genes to be differentially expressed.
>>>>>>>
>>>>>> Hmm ... surely 20k cannot be correct?
>>>>>>
>>>>>> Since I wanted to analyze these genes for deregulated pathways I had to
>>>>>>> come up with fold change values for further analysis.
>>>>>>>
>>>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is
>>>>>>> log2
>>>>>>> of FC values.
>>>>>>>
>>>>>>> When I convert the FC values (test/blank) to foldchanges using IF
>>>>>>>
>>>>>> function
>>>>>>
>>>>>>> I get lesser number of genes to be deregulated. The criteria was =>2
>>>>>>> foldchanges and =<-2 fold changes.
>>>>>>>
>>>>>> I'm missing previous context to this email, so -- not sure what the
>>>>>> "IF function" is, but if you're using limma, the log2fold changes are
>>>>>> reported for you in the logFC column that is returned from
>>>>>> `topTable(...)`
>>>>>>
>>>>>> -steve
>>>>>>
>>>>>> --
>>>>>> Steve Lianoglou
>>>>>> Graduate Student: Computational Systems Biology
>>>>>> | Memorial Sloan-Kettering Cancer Center
>>>>>> | Weill Medical College of Cornell University
>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http://cbio.mskcc.org/~lianos/contact>
>>>>>>
>>>>>>
>>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>>>
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>
>
> --
> ---------------------------------------
> Roopa Shree Subbaiaih
> Post Doctoral Fellow
> Department of Dermatology
> School of Medicine
> Case Western Reserve University
> Cleveland, OH-44106
> Tel:+1 216 368 0211
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list