[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene

Gordon K Smyth smyth at wehi.EDU.AU
Wed Jul 10 01:34:13 CEST 2013


Dear Zhengyu,

> Date: Mon, 8 Jul 2013 20:40:14 -0400
> From: zhengyu jiang <zhyjiang2006 at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] limma for homemade microarray - question on NAs and
> 	multiple probes for one gene
>
> Dear Bioconductor experts,
>
> We have data from a homemade one-channel microarray that I tried to apply
> limma for differential expression analysis between matched paired Normal
> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate
> has been averaged after normalization). All samples are formatted in one
> matrix (M).

This is a standard problem, well covered in the limma User's Guide.

> Signals have been quantile normalized between each paired normal and tumor.

I don't know what you mean by this.  I recommend ordinary quantile 
normalization of all the arrays together, without regard to which sample 
is paired with which.

> Signal values below 5 (log scale) have been replaced by "NA" since they are
> potentially noises. So there are many NAs in M.

Please don't do this.  limma can deal with low intensities perfectly weel, 
and can figure out whether they are noise or not.  You are simply removing 
valid data.

If you are concerned about high variability of low intensity probes, then 
look at

   plotSA(fit_pair)

to examine the mean-variance trend, and use

   eBayes(fit_pair, trend=TRUE)

if there is a strong trend.

> I followed the user manual and made the codes below.
>
> I think the code is correct?

Looks ok.

> My questions are (1) how to deal with NAs - as I did a search but no 
> clear idea

Don't create them in the first place.  This has been said many times 
before!

> (2) how do people do the statistics at the gene level for one gene 
> having multiple probes - averaging or taking median?

Usually we do not find it necessary to aggregate the multiple probes. 
The multiple probes might represent different transcripts or variants, and 
some of the probes might not work.  Why do you need to take any special 
action?

However, if you feel that you must, the best method to aggregate the 
multiple probes is probably to retain the probe for each gene with highest 
fit_pair$Amean value.  We have used this strategy in a number of published 
papers.  The rationale for this is to choose the probe corresponding to 
the most highly expressed transcript for the gene.  Alternatively, you 
could average the probes using the avereps() function in limma.

Best wishes
Gordon

> Thanks,
> Zhengyu
>
>
> > head(M)
>         N1       N2       N3       N4       N5       N6       N7
> N8       T1        T2       T3
> 2  8.622724 7.423568       NA       NA 7.487174       NA 8.516293       NA
> 7.876259  7.856707       NA
>         T4       T5       T6       T7       T8
> 2        NA 7.720018       NA 7.752550       NA
>
>> eset<-as.matrix(M)
>> Pair=factor(targets$Pair)
>>     Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched
> normal to tumors
>>               design<-model.matrix(~Pair+Treat)
>> targets
>   FilenName Pair Treatment
> 1         N1    1         N
> 2         N2    2         N
> 3         N3    3         N
> 4         N4    4         N
> 5         N5    5         N
> 6         N6    6         N
> 7         N7    7         N
> 8         N8    8         N
> 9         T1    1         T
> 10        T2    2         T
> 11        T3    3         T
> 12        T4    4         T
> 13        T5    5         T
> 14        T6    6         T
> 15        T7    7         T
> 16        T8    8         T
> fit_pair<-lmFit(eset,design)
>             fit_pair<-eBayes(fit_pair)
>
> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list