[BioC] LogFC query in Limma

Sun Feb 3 01:04:19 CET 2013

Dear Roopa,

Perhaps you have just misread the limma output.  This thread has been 
mainly concerned with your statement that you found ~20,000 genes to be 
differentially expressed.  But the limma output shows 984 up-regulated and 
1187 down-regulated genes, in other words about 2k total rather than 20k.

Best wishes
Gordon

> Date: Fri, 1 Feb 2013 11:05:36 -0500
> From: Roopa Subbaiaih <rss115 at case.edu>
> To: "James W. MacDonald" <jmacdon at uw.edu>
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] LogFC query in Limma
>
> Hi James,
>
> Am I still making any silly mistake over here. I am comparing normal (21)
> with lesionous patient samples (34). This is what I get in R console. Is
> there a problem with the script?
>
> Any advice would greatly help for further analysis.Thanks, Roopa
>
>> getwd()
> [1] "C:/Documents and Settings/rsubbaiaih/My Documents"
>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>> ls()
> character(0)
>> #-----------------------------------------------#
>> library(affy)
>> eset = justRMA()
> Loading required package: AnnotationDbi
>
>> eset
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 54675 features, 55 samples
>  element names: exprs, se.exprs
> protocolData
>  sampleNames: GSM372286.CEL GSM372287.CEL ...
>    GSM372367.CEL (55 total)
>  varLabels: ScanDate
>  varMetadata: labelDescription
> phenoData
>  sampleNames: GSM372286.CEL GSM372287.CEL ...
>    GSM372367.CEL (55 total)
>  varLabels: sample
>  varMetadata: labelDescription
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation: hgu133plus2
>> f <- factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
> +
> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
> labels=c("Healthy", "affected"))
>> design <- model.matrix(~ 0 + f)
>> design
>   fHealthy faffected
> 1         1         0
> 2         1         0
> 3         1         0
> 4         1         0
> 5         1         0
> 6         1         0
> 7         1         0
> 8         1         0
> 9         1         0
> 10        1         0
> 11        1         0
> 12        1         0
> 13        1         0
> 14        1         0
> 15        1         0
> 16        1         0
> 17        1         0
> 18        1         0
> 19        1         0
> 20        1         0
> 21        1         0
> 22        0         1
> 23        0         1
> 24        0         1
> 25        0         1
> 26        0         1
> 27        0         1
> 28        0         1
> 29        0         1
> 30        0         1
> 31        0         1
> 32        0         1
> 33        0         1
> 34        0         1
> 35        0         1
> 36        0         1
> 37        0         1
> 38        0         1
> 39        0         1
> 40        0         1
> 41        0         1
> 42        0         1
> 43        0         1
> 44        0         1
> 45        0         1
> 46        0         1
> 47        0         1
> 48        0         1
> 49        0         1
> 50        0         1
> 51        0         1
> 52        0         1
> 53        0         1
> 54        0         1
> 55        0         1
> attr(,"assign")
> [1] 1 1
> attr(,"contrasts")
> attr(,"contrasts")$f
> [1] "contr.treatment"
>
>> colnames(design) <-c("Healthy","affected")
>> library(limma)
>> fit <- lmFit(eset, design)
>> fit
> An object of class "MArrayLM"
> $coefficients
>            Healthy affected
> 1007_s_at 10.081309 9.548286
> 1053_at    6.807501 7.482849
> 117_at     5.969921 6.147594
> 121_at     7.403842 7.733666
> 1255_g_at  2.804475 2.827041
> 54670 more rows ...
>
> $rank
> [1] 2
>
> $assign
> [1] 1 1
>
> $qr
> $qr
>     Healthy  affected
> 1 -4.5825757  0.000000
> 2  0.2182179 -5.830952
> 3  0.2182179  0.000000
> 4  0.2182179  0.000000
> 5  0.2182179  0.000000
> 50 more rows ...
>
> $qraux
> [1] 1.218218 1.000000
>
> $pivot
> [1] 1 2
>
> $tol
> [1] 1e-07
>
> $rank
> [1] 2
>
>
> $df.residual
> [1] 53 53 53 53 53
> 54670 more elements ...
>
> $sigma
> 1007_s_at   1053_at    117_at    121_at 1255_g_at
> 0.2866489 0.4801618 0.3499880 0.2332555 0.1053397
> 54670 more elements ...
>
> $cov.coefficients
>            Healthy   affected
> Healthy  0.04761905 0.00000000
> affected 0.00000000 0.02941176
>
> $stdev.unscaled
>            Healthy  affected
> 1007_s_at 0.2182179 0.1714986
> 1053_at   0.2182179 0.1714986
> 117_at    0.2182179 0.1714986
> 121_at    0.2182179 0.1714986
> 1255_g_at 0.2182179 0.1714986
> 54670 more rows ...
>
> $pivot
> [1] 1 2
>
> $genes
> [1] "1007_s_at" "1053_at"   "117_at"    "121_at"
> [5] "1255_g_at"
> 54670 more rows ...
>
> $Amean
> 1007_s_at   1053_at    117_at    121_at 1255_g_at
> 9.751804  7.224988  6.079756  7.607733  2.818425
> 54670 more elements ...
>
> $method
> [1] "ls"
>
> $design
>  Healthy affected
> 1       1        0
> 2       1        0
> 3       1        0
> 4       1        0
> 5       1        0
> 50 more rows ...
>
>> contrast.matrix <-makeContrasts(affected-Healthy,levels = design)
>> fit2 <- contrasts.fit(fit, contrast.matrix)
>> fit2 <- eBayes(fit2)
>> results <- decideTests(fit2, adjust="fdr",lfc=1)
>> summary(results)
>   affected - Healthy
> -1               1187
> 0               52504
> 1                 984
>> results
> TestResults matrix
> 1007_s_at   1053_at    117_at    121_at 1255_g_at
>        0         0         0         0         0
> 54670 more rows ...
>>
>
>
> On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> If you just use the expression values from the original authors, I get
>> just under 9K probesets for this comparison at an FDR of 0.05 and no fold
>> change criterion. It drops to just under 900 with a 2-fold difference added
>> in.
>>
>> So yeah, seems like a lot to me as well.
>>
>> Best,
>>
>> Jim
>>
>>
>> On 1/31/2013 4:02 PM, Steve Lianoglou wrote:
>>
>>> ... what Jim said.
>>>
>>> But also, this 20k differentially expressed (likely probe sets, not
>>> genes) is raising a red flag for me, no? Am I alone here?
>>>
>>> That's .. what's the word I'm looking for ... "a lot".
>>>
>>> -steve
>>>
>>> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon at uw.edu>
>>>  wrote:
>>>
>>>> Hi Roopa,
>>>>
>>>>
>>>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote:
>>>>
>>>>> Hi Steve,
>>>>>
>>>>> This was the script I used-
>>>>> getwd()
>>>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>>>>> ls()
>>>>> #-----------------------------**------------------#
>>>>> library(affy)
>>>>> eset = justRMA()
>>>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1,
>>>>>
>>>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**
>>>>> 2,2,2,2),
>>>>> labels=c("Healthy", "unaffected"))
>>>>> design<- model.matrix(~ 0 + f)
>>>>> design
>>>>> colnames(design)<-c("Healthy",**"unaffected")
>>>>> design
>>>>> library(limma)
>>>>> fit<- lmFit(eset, design)
>>>>> library(hgu133plus2.db)
>>>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"**hgu133plus2.db")
>>>>> contrast.matrix<-**makeContrasts(affected-**Healthy,levels = design)
>>>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>>>> fit2<- eBayes(fit2)
>>>>> topTable(fit2,coef=1,p=0.05, adjust="fdr")
>>>>> results<- decideTests(fit2, adjust="fdr", p=0.05)
>>>>> summary(results)
>>>>> write.table(results,file="**myresults.txt")
>>>>> write.fit().
>>>>>
>>>>> I had identified ~54,000 genes of which ~ 20K were differentially
>>>>> expressed.
>>>>>
>>>>> But when I use these genes for pathway analysis the software asks for
>>>>> fold
>>>>> change values but not p value so it is easier to analyze the data.
>>>>>
>>>>> What I did was - I used the differentially expressed gene table for
>>>>> further
>>>>> analysis. That is I converted logFC values to FC(test/control) assuming
>>>>> that
>>>>>
>>>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values.
>>>>>
>>>>> Once I got test/control values I converted them to fold changes using
>>>>> "IF"
>>>>> function in excel sheet to eliminate genes with fold changes between -2
>>>>> to
>>>>> +2.
>>>>>
>>>>> Once I did this the number of significant genes drastically reduced to ~
>>>>> 2,000.
>>>>>
>>>>> Is this the right method?
>>>>>
>>>>
>>>> No. Note that the range of fold changes after 'unlogging' will be 0-INF,
>>>> and
>>>> the down-regulated genes will be in the range 0-1 whereas the upregulated
>>>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas 2
>>>> fold down will be 1/2 or 0.5).
>>>>
>>>> The easiest way to filter is to keep the logFC and filter on -1 and 1. Or
>>>> you can use the lfc argument to decideTests().
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>> Please advice, thanks, Roopa
>>>>>
>>>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou<
>>>>> mailinglist.honeypot at gmail.com**>   wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115 at case.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Thanks for the reply, I could pull out the the whole information for
>>>>>>> differentially expressed genes. The criteria used was adjust="fdr",
>>>>>>>
>>>>>> p=0.05.
>>>>>>
>>>>>>> I came up with ~ 20,000 genes to be differentially expressed.
>>>>>>>
>>>>>> Hmm ... surely 20k cannot be correct?
>>>>>>
>>>>>> Since I wanted to analyze these genes for deregulated pathways I had to
>>>>>>> come up with fold change values for further analysis.
>>>>>>>
>>>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is
>>>>>>> log2
>>>>>>> of FC values.
>>>>>>>
>>>>>>> When I convert the FC values (test/blank) to foldchanges using IF
>>>>>>>
>>>>>> function
>>>>>>
>>>>>>> I get lesser number of genes to be deregulated. The criteria was =>2
>>>>>>> foldchanges and =<-2 fold changes.
>>>>>>>
>>>>>> I'm missing previous context to this email, so -- not sure what the
>>>>>> "IF function" is, but if you're using limma, the log2fold changes are
>>>>>> reported for you in the logFC column that is returned from
>>>>>> `topTable(...)`
>>>>>>
>>>>>> -steve
>>>>>>
>>>>>> --
>>>>>> Steve Lianoglou
>>>>>> Graduate Student: Computational Systems Biology
>>>>>>    | Memorial Sloan-Kettering Cancer Center
>>>>>>    | Weill Medical College of Cornell University
>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http://cbio.mskcc.org/~lianos/contact>
>>>>>>
>>>>>>
>>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>>>
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>
>
> -- 
> ---------------------------------------
> Roopa Shree Subbaiaih
> Post Doctoral Fellow
> Department of Dermatology
> School of Medicine
> Case Western Reserve University
> Cleveland, OH-44106
> Tel:+1 216 368 0211
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}