[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Jul 13 01:20:51 CEST 2013
Dear Zhengyu,
Please keep questions on the list.
Results show that pair 6 is different from the others, and that there
isn't a consistent treatment effect. I suggest that you explore your data
data more:
plotMDS(eset)
The SvsA plot shows that the log-expression values are less variable at
both low and high intensities. This has presumably arisen from the
microarray platform and from the way that you have pre-processed the data,
which you don't tell us much about. In any case, limma has taken the
trend into account.
Since this a single-channel platform, your comment "take log ratios" is a
bit mysterious. Do you just mean log-intensity?
Best wishes
Gordon
On Fri, 12 Jul 2013, zhengyu jiang wrote:
> Hi Gordon,
> Thank you very much for your reply. Sorry for the delay.
>
> Following your instructions, I started with original matrix (M) of raw
> intensities & made the codes below.
> If everything is correct, my questions are (1) does the figure (below) log
> residual standard deviation vs. log expression make sense? - how should I
> decide if it is normal?
> (2) does the summary(decideTest) tell me that only pair 6 has great changes
> in expression?
>
> Thanks,
>
> eset<-as.matrix(log(M)) #### take log ratios
> eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile
> normalization for all arrays
> design<-model.matrix(~Pair+Treat)
> targets<-readTargets("targets.txt",row.names=1) ## or row name =1
> Pair<-factor(targets$Pair)
> Treat<-factor(targets$Treatment,levels=c("N","T"))
> fit_pair<-lmFit(eset,design)
> plotSA(fit_pair)
> fit= eBayes(fit_pair, trend=TRUE)
>> summary(decideTests(fit))
> (Intercept) Pair2 Pair3 Pair4 Pair5 Pair6 Pair7 Pair8 TreatT
> -1 0 0 0 0 0 2770 0 0 0
> 0 0 257279 257279 257278 257275 254372 257275 257273 257279
> 1 257279 0 0 0 0 102 0 2 0
>> R=topTable(fit, coef="TreatT", adjust="BH",number=30
> + )
>> head(R)
> logFC AveExpr t P.Value adj.P.Val B
> 207370 0.6232462 6.473489 5.990300 6.818613e-05 0.8997989 -3.715276
> 45481 0.5559283 6.477330 4.960299 3.494802e-04 0.8997989 -3.822746
> 178158 0.4404808 6.148200 4.872321 4.046088e-04 0.8997989 -3.833704
> 172530 -0.3670701 5.793056 -4.800323 4.564952e-04 0.8997989 -3.842908
> 165527 0.4721218 6.324581 4.740798 5.046498e-04 0.8997989 -3.850680
> 80386 -0.5450653 6.551971 -4.723036 5.200275e-04 0.8997989 -3.853028
>
>
>
>
> [image: Inline image 1]
>
>
> On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Zhengyu,
>>
>> Date: Mon, 8 Jul 2013 20:40:14 -0400
>>> From: zhengyu jiang <zhyjiang2006 at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] limma for homemade microarray - question on NAs and
>>> multiple probes for one gene
>>>
>>> Dear Bioconductor experts,
>>>
>>> We have data from a homemade one-channel microarray that I tried to apply
>>> limma for differential expression analysis between matched paired Normal
>>> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech
>>> replicate
>>> has been averaged after normalization). All samples are formatted in one
>>> matrix (M).
>>>
>>
>> This is a standard problem, well covered in the limma User's Guide.
>>
>> Signals have been quantile normalized between each paired normal and
>>> tumor.
>>>
>>
>> I don't know what you mean by this. I recommend ordinary quantile
>> normalization of all the arrays together, without regard to which sample is
>> paired with which.
>>
>> Signal values below 5 (log scale) have been replaced by "NA" since they
>>> are
>>> potentially noises. So there are many NAs in M.
>>>
>>
>> Please don't do this. limma can deal with low intensities perfectly weel,
>> and can figure out whether they are noise or not. You are simply removing
>> valid data.
>>
>> If you are concerned about high variability of low intensity probes, then
>> look at
>>
>> plotSA(fit_pair)
>>
>> to examine the mean-variance trend, and use
>>
>> eBayes(fit_pair, trend=TRUE)
>>
>> if there is a strong trend.
>>
>> I followed the user manual and made the codes below.
>>>
>>> I think the code is correct?
>>>
>>
>> Looks ok.
>>
>> My questions are (1) how to deal with NAs - as I did a search but no
>>> clear idea
>>>
>>
>> Don't create them in the first place. This has been said many times
>> before!
>>
>> (2) how do people do the statistics at the gene level for one gene having
>>> multiple probes - averaging or taking median?
>>>
>>
>> Usually we do not find it necessary to aggregate the multiple probes. The
>> multiple probes might represent different transcripts or variants, and some
>> of the probes might not work. Why do you need to take any special action?
>>
>> However, if you feel that you must, the best method to aggregate the
>> multiple probes is probably to retain the probe for each gene with highest
>> fit_pair$Amean value. We have used this strategy in a number of published
>> papers. The rationale for this is to choose the probe corresponding to the
>> most highly expressed transcript for the gene. Alternatively, you could
>> average the probes using the avereps() function in limma.
>>
>> Best wishes
>> Gordon
>>
>> Thanks,
>>> Zhengyu
>>>
>>>
>>>> head(M)
>>> N1 N2 N3 N4 N5 N6 N7
>>> N8 T1 T2 T3
>>> 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA
>>> 7.876259 7.856707 NA
>>> T4 T5 T6 T7 T8
>>> 2 NA 7.720018 NA 7.752550 NA
>>>
>>> eset<-as.matrix(M)
>>>> Pair=factor(targets$Pair)
>>>> Treat=factor(targets$**Treatment,levels=c("N","T")) # compared
>>>> matched
>>>>
>>> normal to tumors
>>>
>>>> design<-model.matrix(~Pair+**Treat)
>>>> targets
>>>>
>>> FilenName Pair Treatment
>>> 1 N1 1 N
>>> 2 N2 2 N
>>> 3 N3 3 N
>>> 4 N4 4 N
>>> 5 N5 5 N
>>> 6 N6 6 N
>>> 7 N7 7 N
>>> 8 N8 8 N
>>> 9 T1 1 T
>>> 10 T2 2 T
>>> 11 T3 3 T
>>> 12 T4 4 T
>>> 13 T5 5 T
>>> 14 T6 6 T
>>> 15 T7 7 T
>>> 16 T8 8 T
>>> fit_pair<-lmFit(eset,design)
>>> fit_pair<-eBayes(fit_pair)
>>>
>>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top
>>> 30
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list