[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene

Gordon K Smyth smyth at wehi.EDU.AU
Wed Jul 17 03:34:24 CEST 2013


Dear Zhengyu,

On Tue, 16 Jul 2013, zhengyu jiang wrote:

> Hi Gordon,
> Thanks for your reply.  What do you think of the MDS plot below?

The samples don't cluster on the MDS plot by either treatment or pair, so 
it is not surprising that there are no statistically significant results.

> For preprocessing, the microarray was designed as SNP array. We hybridized
> cDNA samples on the SNP array. For each marker, there are two channel
> reads. But for expression analysis, I combined both channel raw intensities
> to give one value for each marker - so it is considered as one-channel
> microarray.
>
> I took log values for each value - so yes it is log-intensities rather than
> log ratios. sorry for the confusing.
>
> I am actually not familiar with limma. You mentioned that limma deals with
> noises pretty well so I did not remove any values.
>
> Could you please comment more on applying limma to analyzing the
> differential expression in my case?

I have no experience applying limma to a SNP array.  I have no feel for 
what is sensible to do in this context so I can't give you any more help.

Best wishes
Gordon

> Thanks,
> Zhengyu
>
>
> [image: Inline image 1]
>
>
> On Fri, Jul 12, 2013 at 7:20 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Zhengyu,
>>
>> Please keep questions on the list.
>>
>> Results show that pair 6 is different from the others, and that there
>> isn't a consistent treatment effect.  I suggest that you explore your data
>> data more:
>>
>>   plotMDS(eset)
>>
>> The SvsA plot shows that the log-expression values are less variable at
>> both low and high intensities.  This has presumably arisen from the
>> microarray platform and from the way that you have pre-processed the data,
>> which you don't tell us much about.  In any case, limma has taken the trend
>> into account.
>>
>> Since this a single-channel platform, your comment "take log ratios" is a
>> bit mysterious.  Do you just mean log-intensity?
>>
>> Best wishes
>> Gordon
>>
>>
>> On Fri, 12 Jul 2013, zhengyu jiang wrote:
>>
>>  Hi Gordon,
>>> Thank you very much for your reply. Sorry for the delay.
>>>
>>> Following your instructions, I started with original matrix (M) of raw
>>> intensities & made the codes below.
>>> If everything is correct, my questions are (1) does the figure (below) log
>>> residual standard deviation vs. log expression make sense? - how should I
>>> decide if it is normal?
>>> (2) does the summary(decideTest) tell me that only pair 6 has great
>>> changes
>>> in expression?
>>>
>>> Thanks,
>>>
>>> eset<-as.matrix(log(M)) #### take log ratios
>>> eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile
>>> normalization for all arrays
>>> design<-model.matrix(~Pair+**Treat)
>>> targets<-readTargets("targets.**txt",row.names=1)  ## or row name =1
>>> Pair<-factor(targets$Pair)
>>> Treat<-factor(targets$**Treatment,levels=c("N","T"))
>>> fit_pair<-lmFit(eset,design)
>>> plotSA(fit_pair)
>>> fit= eBayes(fit_pair, trend=TRUE)
>>>
>>>> summary(decideTests(fit))
>>>>
>>>   (Intercept)  Pair2  Pair3  Pair4  Pair5  Pair6  Pair7  Pair8 TreatT
>>> -1           0      0      0      0      0   2770      0      0      0
>>> 0            0 257279 257279 257278 257275 254372 257275 257273 257279
>>> 1       257279      0      0      0      0    102      0      2      0
>>>
>>>> R=topTable(fit, coef="TreatT", adjust="BH",number=30
>>>>
>>> + )
>>>
>>>> head(R)
>>>>
>>>            logFC  AveExpr         t      P.Value adj.P.Val         B
>>> 207370  0.6232462 6.473489  5.990300 6.818613e-05 0.8997989 -3.715276
>>> 45481   0.5559283 6.477330  4.960299 3.494802e-04 0.8997989 -3.822746
>>> 178158  0.4404808 6.148200  4.872321 4.046088e-04 0.8997989 -3.833704
>>> 172530 -0.3670701 5.793056 -4.800323 4.564952e-04 0.8997989 -3.842908
>>> 165527  0.4721218 6.324581  4.740798 5.046498e-04 0.8997989 -3.850680
>>> 80386  -0.5450653 6.551971 -4.723036 5.200275e-04 0.8997989 -3.853028
>>>
>>>
>>>
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>
>>>  Dear Zhengyu,
>>>>
>>>>  Date: Mon, 8 Jul 2013 20:40:14 -0400
>>>>
>>>>> From: zhengyu jiang <zhyjiang2006 at gmail.com>
>>>>> To: bioconductor at r-project.org
>>>>> Subject: [BioC] limma for homemade microarray - question on NAs and
>>>>>         multiple probes for one gene
>>>>>
>>>>> Dear Bioconductor experts,
>>>>>
>>>>> We have data from a homemade one-channel microarray that I tried to 
>>>>> apply limma for differential expression analysis between matched 
>>>>> paired Normal (N) and Tumor (Tumor) samples - 8 biological 
>>>>> replicates (one tech replicate has been averaged after 
>>>>> normalization). All samples are formatted in one matrix (M).
>>>>>
>>>>>
>>>> This is a standard problem, well covered in the limma User's Guide.
>>>>
>>>>> Signals have been quantile normalized between each paired normal and 
>>>>> tumor.
>>>>>
>>>>>
>>>> I don't know what you mean by this.  I recommend ordinary quantile 
>>>> normalization of all the arrays together, without regard to which 
>>>> sample is paired with which.
>>>>>
>>>>>  Signal values below 5 (log scale) have been replaced by "NA" since 
>>>>> they are potentially noises. So there are many NAs in M.
>>>>>
>>>>>
>>>> Please don't do this.  limma can deal with low intensities perfectly
>>>> weel,
>>>> and can figure out whether they are noise or not.  You are simply
>>>> removing
>>>> valid data.
>>>>
>>>> If you are concerned about high variability of low intensity probes, then
>>>> look at
>>>>
>>>>   plotSA(fit_pair)
>>>>
>>>> to examine the mean-variance trend, and use
>>>>
>>>>   eBayes(fit_pair, trend=TRUE)
>>>>
>>>> if there is a strong trend.
>>>>
>>>>  I followed the user manual and made the codes below.
>>>>
>>>>>
>>>>> I think the code is correct?
>>>>>
>>>>>
>>>> Looks ok.
>>>>
>>>>  My questions are (1) how to deal with NAs - as I did a search but no
>>>>
>>>>> clear idea
>>>>>
>>>>>
>>>> Don't create them in the first place.  This has been said many times
>>>> before!
>>>>
>>>>  (2) how do people do the statistics at the gene level for one gene
>>>> having
>>>>
>>>>> multiple probes - averaging or taking median?
>>>>>
>>>>>
>>>> Usually we do not find it necessary to aggregate the multiple probes. The
>>>> multiple probes might represent different transcripts or variants, and
>>>> some
>>>> of the probes might not work.  Why do you need to take any special
>>>> action?
>>>>
>>>> However, if you feel that you must, the best method to aggregate the
>>>> multiple probes is probably to retain the probe for each gene with
>>>> highest
>>>> fit_pair$Amean value.  We have used this strategy in a number of
>>>> published
>>>> papers.  The rationale for this is to choose the probe corresponding to
>>>> the
>>>> most highly expressed transcript for the gene.  Alternatively, you could
>>>> average the probes using the avereps() function in limma.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>  Thanks,
>>>>
>>>>> Zhengyu
>>>>>
>>>>>
>>>>>  head(M)
>>>>>>
>>>>>         N1       N2       N3       N4       N5       N6       N7
>>>>> N8       T1        T2       T3
>>>>> 2  8.622724 7.423568       NA       NA 7.487174       NA 8.516293
>>>>> NA
>>>>> 7.876259  7.856707       NA
>>>>>         T4       T5       T6       T7       T8
>>>>> 2        NA 7.720018       NA 7.752550       NA
>>>>>
>>>>>  eset<-as.matrix(M)
>>>>>
>>>>>> Pair=factor(targets$Pair)
>>>>>>     Treat=factor(targets$****Treatment,levels=c("N","T")) # compared
>>>>>> matched
>>>>>>
>>>>>>  normal to tumors
>>>>>
>>>>>                design<-model.matrix(~Pair+****Treat)
>>>>>> targets
>>>>>>
>>>>>>    FilenName Pair Treatment
>>>>> 1         N1    1         N
>>>>> 2         N2    2         N
>>>>> 3         N3    3         N
>>>>> 4         N4    4         N
>>>>> 5         N5    5         N
>>>>> 6         N6    6         N
>>>>> 7         N7    7         N
>>>>> 8         N8    8         N
>>>>> 9         T1    1         T
>>>>> 10        T2    2         T
>>>>> 11        T3    3         T
>>>>> 12        T4    4         T
>>>>> 13        T5    5         T
>>>>> 14        T6    6         T
>>>>> 15        T7    7         T
>>>>> 16        T8    8         T
>>>>> fit_pair<-lmFit(eset,design)
>>>>>             fit_pair<-eBayes(fit_pair)
>>>>>
>>>>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top
>>>>> 30
>>>>>
>>>>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list