[BioC] Using limma to identify differentially expressed genes

Wed Apr 11 16:27:39 CEST 2012

Thanks Jim.

-Avoks

On Wed, Apr 11, 2012 at 2:56 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Avoks,
>
>
> On 4/11/2012 7:28 AM, Ovokeraye Achinike-Oduaran wrote:
>>
>> Hi Jim,
>>
>> I digress a bit (sorry David). But I was looking at your code
>> combining both getGEO and getGEOSuppFiles. The analysis you did I'm
>> guessing is based on the raw files because you had to read in an
>> affybatch object. I'm having a challenge with making my covdesc.txt
>> file to work for me with read.affy(), so I'm wondering if your
>> combination is a way to retrieve the phenotypic data without having to
>> manually create the text file. In other words, my question is: does
>> this combination of getGEO() and getGEOSuppFiles() make it possible to
>> boycott the use of the manually created covdesc.txt file in
>> read.affy()?
>
>
> I assume that a covdesc.txt file is something you want to use for the
> phenoData slot of your ExpressionSet? If so, note that you don't have to
> explicitly create or use the phenoData slot; it is there in order to make
> your ExpressionSet self-descriptive for others.
>
> Unless you are planning to give your ExpressionSet to somebody else, I don't
> see a pressing reason to ever bother with creating and using the phenoData.
> You already know what is in there, and can easily create any design
> matrices, etc for further analyses.
>
> But to answer your question, I only used getGEO() in order to get the
> phenoData so I could easily create a design matrix without having to figure
> out which sample is which.
>
> Best,
>
> Jim
>
>
>
>>
>> Thanks.
>>
>> Avoks
>>
>>
>> On Tue, Apr 10, 2012 at 5:14 PM, David Westergaard<david at harsk.dk>  wrote:
>>>
>>> Hello,
>>>
>>> I've been trying to use limma to identify genes from the following
>>> data: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-21340 -
>>> It's a simple control vs. disease experiment
>>>
>>>
>>> # SDRF downloaded from above page
>>> SDRF<-
>>> read.table(file="E-GEOD-21340.sdrf.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
>>>
>>> # Looking to compare Family-history negativer versus Diabetis
>>> controls<- SDRF[grep("Control, Family History
>>> Neg",SDRF$Comment..Sample_source_name.),]
>>> disease<- SDRF[grep("^DM",SDRF$Characteristics.DiseaseState.),]
>>> Batch<- rbind(controls,disease)
>>>
>>> # Read in CEL files
>>> mixture.batch<- ReadAffy(filenames=Batch$Array.Data.File)
>>>
>>> # Preprocess data
>>> mixture.processed<- expresso(mixture.batch, bgcorrect.method = "rma",
>>> normalize.method = "quantiles", pmcorrect.method = "pmonly",
>>> summary.method = "medianpolish")
>>>
>>> # Get data in matrix
>>> signals<- exprs(mixture.prepared)
>>> cl<- ifelse(colnames(signals) %in% disease$Array.Data.File,1,0)
>>>
>>> # Do design matrix and fit
>>> design<- model.matrix(~factor(cl))
>>> fit<- lmFit(signals,design)
>>> fit<- eBayes(fit)
>>> topTable(fit2,coef=2)
>>>
>>> Which yields the following:
>>>               ID  logFC AveExpr     t P.Value adj.P.Val     B
>>> 7513    208004_at -0.323 5.43 -4.65 0.00191     0.999 -3.10
>>> 11225 211829_s_at  0.340 5.07 4.36 0.00278     0.999 -3.17
>>> 5950    206424_at -0.907    6.65 -4.15 0.00363     0.999 -3.23
>>> 1354  201826_s_at -0.447 8.37 -4.13 0.00374     0.999 -3.24
>>> 19782   220418_at  0.392 5.43 4.02 0.00431     0.999 -3.27
>>> 8889  209396_s_at  1.899    7.47  4.01 0.00437     0.999 -3.28
>>> 5005    205478_at -0.931    9.22 -3.94 0.00481     0.999 -3.30
>>> 9469  209983_s_at  0.412 5.72 3.92 0.00492     0.999 -3.31
>>> 2936    203409_at  0.506    6.93  3.87 0.00531     0.999 -3.32
>>> 5054  205527_s_at  0.331 6.80 3.84 0.00549     0.999 -3.33
>>>
>>> I'm abit puzzled over the adjusted P-values. Can it really be true
>>> that ALL of the adjusted P-values are 0.999, or did I make a rookie
>>> mistake somewhere?
>>>
>>> Best Regards,
>>> David Westergaard
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>