[BioC] Using limma to identify differentially expressed genes

James W. MacDonald jmacdon at uw.edu
Wed Apr 11 14:56:12 CEST 2012


Hi Avoks,

On 4/11/2012 7:28 AM, Ovokeraye Achinike-Oduaran wrote:
> Hi Jim,
>
> I digress a bit (sorry David). But I was looking at your code
> combining both getGEO and getGEOSuppFiles. The analysis you did I'm
> guessing is based on the raw files because you had to read in an
> affybatch object. I'm having a challenge with making my covdesc.txt
> file to work for me with read.affy(), so I'm wondering if your
> combination is a way to retrieve the phenotypic data without having to
> manually create the text file. In other words, my question is: does
> this combination of getGEO() and getGEOSuppFiles() make it possible to
> boycott the use of the manually created covdesc.txt file in
> read.affy()?

I assume that a covdesc.txt file is something you want to use for the 
phenoData slot of your ExpressionSet? If so, note that you don't have to 
explicitly create or use the phenoData slot; it is there in order to 
make your ExpressionSet self-descriptive for others.

Unless you are planning to give your ExpressionSet to somebody else, I 
don't see a pressing reason to ever bother with creating and using the 
phenoData. You already know what is in there, and can easily create any 
design matrices, etc for further analyses.

But to answer your question, I only used getGEO() in order to get the 
phenoData so I could easily create a design matrix without having to 
figure out which sample is which.

Best,

Jim


>
> Thanks.
>
> Avoks
>
>
> On Tue, Apr 10, 2012 at 5:14 PM, David Westergaard<david at harsk.dk>  wrote:
>> Hello,
>>
>> I've been trying to use limma to identify genes from the following
>> data: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-21340 -
>> It's a simple control vs. disease experiment
>>
>>
>> # SDRF downloaded from above page
>> SDRF<- read.table(file="E-GEOD-21340.sdrf.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
>>
>> # Looking to compare Family-history negativer versus Diabetis
>> controls<- SDRF[grep("Control, Family History
>> Neg",SDRF$Comment..Sample_source_name.),]
>> disease<- SDRF[grep("^DM",SDRF$Characteristics.DiseaseState.),]
>> Batch<- rbind(controls,disease)
>>
>> # Read in CEL files
>> mixture.batch<- ReadAffy(filenames=Batch$Array.Data.File)
>>
>> # Preprocess data
>> mixture.processed<- expresso(mixture.batch, bgcorrect.method = "rma",
>> normalize.method = "quantiles", pmcorrect.method = "pmonly",
>> summary.method = "medianpolish")
>>
>> # Get data in matrix
>> signals<- exprs(mixture.prepared)
>> cl<- ifelse(colnames(signals) %in% disease$Array.Data.File,1,0)
>>
>> # Do design matrix and fit
>> design<- model.matrix(~factor(cl))
>> fit<- lmFit(signals,design)
>> fit<- eBayes(fit)
>> topTable(fit2,coef=2)
>>
>> Which yields the following:
>>                ID  logFC AveExpr     t P.Value adj.P.Val     B
>> 7513    208004_at -0.323 5.43 -4.65 0.00191     0.999 -3.10
>> 11225 211829_s_at  0.340 5.07 4.36 0.00278     0.999 -3.17
>> 5950    206424_at -0.907    6.65 -4.15 0.00363     0.999 -3.23
>> 1354  201826_s_at -0.447 8.37 -4.13 0.00374     0.999 -3.24
>> 19782   220418_at  0.392 5.43 4.02 0.00431     0.999 -3.27
>> 8889  209396_s_at  1.899    7.47  4.01 0.00437     0.999 -3.28
>> 5005    205478_at -0.931    9.22 -3.94 0.00481     0.999 -3.30
>> 9469  209983_s_at  0.412 5.72 3.92 0.00492     0.999 -3.31
>> 2936    203409_at  0.506    6.93  3.87 0.00531     0.999 -3.32
>> 5054  205527_s_at  0.331 6.80 3.84 0.00549     0.999 -3.33
>>
>> I'm abit puzzled over the adjusted P-values. Can it really be true
>> that ALL of the adjusted P-values are 0.999, or did I make a rookie
>> mistake somewhere?
>>
>> Best Regards,
>> David Westergaard
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list