[BioC] Paired limma analysis

James W. MacDonald jmacdon at uw.edu
Fri Aug 3 18:29:57 CEST 2012


Please don't take things off-list.

On 8/3/2012 12:21 PM, mb3058 at columbia.edu wrote:
> Hi Jim,
>
> Thank you for replying to my questions. Your answers have somewhat 
> cleared the doubts that I had about the results. I will be obliged if 
> you could reply to some follow up questions I had.
>
> 1.Out of Pair2 and Treatlo, which one is the logFC column (my input 
> data is log2 transformed)?

I have no idea what you are asking here.

> 2.What threshold should I be taking for Treatlo coefficient to get 
> genes that are significantly differentially expressed between hi and lo.

That's up to you, isn't it? If you are doing the analysis, then you have 
to take responsibility for the choices you make. Why would you abdicate 
this responsibility to some random dude on a listserv? If I said you 
need a p-value of 0.5 and a -1.5 fold change, would you use that advice? 
I have to admit, I am somewhat boggled.

Best,

Jim


>
> Thanks again
>
>
>
> Quoting "James W. MacDonald" <jmacdon at uw.edu>:
>
>> Hi mb3058 at columbia.edu,
>>
>> On 8/2/2012 6:59 PM, mb3058 at columbia.edu wrote:
>>> I am running a paired analysis with limma.
>>> My platform is affymetrix mouse gene st.1.0 array
>>> I have a paired group (hi, lo) with duplicates per group.
>>> My script so far is this:
>>> exprs <- read.table("data.txt",as.is=T,sep="\t",header=T,row.names=1)
>>> targets<-readTargets("pairedTargetFile.txt")
>>> Pair <- factor(targets$Group)
>>> Treat <- factor(targets$Treatment, levels=c("hi","lo"))
>>> design <- model.matrix(~Pair+Treat)
>>> fit <- lmFit(exprs, design)
>>> fit <- eBayes(fit)
>>> #I want to print out the limma results for all the probes
>>> Results<- topTable(fit,number=241425,adjust.method="BH")
>>> write.table(Results,"paired_Results.txt",sep="\t")
>>>
>>> ID    X.Intercept.    Pair2    Treatlo    AveExpr    F    P.Value   
>>>   adj.P.Val
>>> 10469363    10.590048    2.186194    -0.636504    11.364893     
>>> 1566.532105    0.000000469    0.00000942
>>> 10365564    10.12985925    2.3047915    -0.2214415    11.17153425   
>>>   1528.543391    0.000000495    0.00000942
>>> 10469367    9.8800415    2.440832    -0.652667    10.774124     
>>> 1417.807515    0.000000583    0.00000942
>>> 10517519    10.1252575    2.183145    0.212485    11.3230725     
>>> 1403.521849    0.000000596    0.00000942
>>> 10469380    9.61192575    2.9290725    -0.8228465    10.66503875    
>>>  1396.225476    0.000000603    0.00000942
>>>
>>> Q1.I am trying to interpret some of the fields. Could someone help  
>>> me understand what the fields X.Intercept.    Pair2    Treatlo  
>>> stand for?
>>
>> Well, I can tell you what the coefficients are estimating, but make no
>> warranties concerning your understanding.
>>
>> The X.Intercept can be thought of as a baseline. Technically it
>> estimates the mean expression of the hi treated samples for the first
>> of the pairs. The Pair2 coefficient estimates the difference between
>> the pairs, and the Treatlo coefficient estimates the difference between
>> the hi and lo treatment.
>>
>> I would assume that you want to find those genes that are
>> differentially expressed between hi and lo, after adjusting for the
>> pairing. For that you want the Treatlo coefficient.
>>
>>>
>>>
>>>
>>> Q2. When I run the following line to get results
>>> Results <- topTable(fit, coef="Treatlo"), I get the following results:
>>> ID    logFC    AveExpr    t    P.Value    adj.P.Val    B
>>> 10354461    4.390143    5.666826    14.06184    8.62E-05     
>>> 0.5279716   -0.7055664
>>> 10583079    4.249853    6.16608    13.51989    1.02E-04     
>>> 0.5279716   -0.7296509
>>> 10587318    4.180087    4.887135    13.34857    1.08E-04     
>>> 0.5279716   -0.7378131
>>> 10370298    4.0321    5.502014    12.69299    1.33E-04    0.5279716 
>>>     -0.7717809
>>> 10392551    4.259277    4.614051    12.44214    1.45E-04     
>>> 0.5279716   -0.7860399
>>> 10368320    4.387814    5.911313    12.36954    1.49E-04     
>>> 0.5279716   -0.7903085
>>> 10587807    3.805685    4.826062    12.30147    1.52E-04     
>>> 0.5279716   -0.7943703
>>> 10504815    3.793085    5.657937    12.27763    1.54E-04     
>>> 0.5279716   -0.7958069
>>> 10357489    3.968974    4.666258    12.23582    1.56E-04     
>>> 0.5279716   -0.7983445
>>> 10481521    3.757882    7.856306    12.15789    1.60E-04     
>>> 0.5279716   -0.8031341
>>>
>>> However when I run the following command to get results for all the 
>>>  probes, I get different values for the same probes. See PValue,  
>>> Adj. P value
>>
>> Well, you don't show the command you use. I would assume you ran
>> topTable() without specifying a coefficient, in which case the
>> following portions from ?topTable might be instructive:
>>
>> coef: column number or column name specifying which coefficient or
>>           contrast of the linear model is of interest. For 'topTable',
>>           can also be a vector of column subscripts, in which case the
>>           gene ranking is by F-statistic for that set of contrasts.
>>
>> and
>>
>>  'topTableF' ranks genes on the basis of moderated F-statistics for
>>      one or more coefficients. If 'topTable' is called with 'coef' has
>>      length greater than 1, then the specified columns will be
>>      extracted from 'fit' and 'topTableF' called on the result.
>>      'topTable' with 'coef=NULL' is the same as 'topTableF', unless the
>>      fitted model 'fit' has only one column.
>>
>> The default to topTable() is coef=NULL, in which case you are calling
>> topTableF(), and computing the F-statistic over all coefficients, which
>> I highly doubt is what you want to do.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>> ID    X.Intercept.    Pair2    Treatlo    AveExpr    F    P.Value   
>>>   adj.P.Val
>>> 10354461    2.963005    1.0175    4.390143    5.6668265     
>>> 441.9193584   7.37E-06    1.48E-05
>>> 10583079    5.2467745    -2.411241    4.249853    6.1660805     
>>> 516.5649337    5.25E-06    1.24E-05
>>> 10587318    2.88690775    -0.1796325    4.1800875    4.88713525     
>>> 334.0547737    1.35E-05    2.19E-05
>>> 10370298    4.54191225    -2.1118965    4.0321005    5.50201425     
>>> 408.7688335    8.74E-06    1.64E-05
>>> 10392551    2.4626765    0.043472    4.259277    4.614051     
>>> 261.1260083    2.31E-05    3.27E-05
>>> 10368320    3.76314325    -0.0914745    4.3878135    5.91131275     
>>> 377.3505032    1.04E-05    1.83E-05
>>> 10587807    2.69620575    0.4540285    3.8056845    4.82606225     
>>> 325.4919972    1.43E-05    2.28E-05
>>> 10504815    5.27270925    -3.0226305    3.7930855    5.65793675     
>>> 458.5242305    6.81E-06    1.42E-05
>>> 10357489    2.90779925    -0.4520565    3.9689735    4.66625775     
>>> 286.4197106    1.89E-05    2.80E-05
>>> 10481521    6.637718    -1.320705    3.757882    7.8563065     
>>> 794.2002708    2.06E-06    9.55E-06
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:  
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list