[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Jul 10 01:34:13 CEST 2013
Dear Zhengyu,
> Date: Mon, 8 Jul 2013 20:40:14 -0400
> From: zhengyu jiang <zhyjiang2006 at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] limma for homemade microarray - question on NAs and
> multiple probes for one gene
>
> Dear Bioconductor experts,
>
> We have data from a homemade one-channel microarray that I tried to apply
> limma for differential expression analysis between matched paired Normal
> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate
> has been averaged after normalization). All samples are formatted in one
> matrix (M).
This is a standard problem, well covered in the limma User's Guide.
> Signals have been quantile normalized between each paired normal and tumor.
I don't know what you mean by this. I recommend ordinary quantile
normalization of all the arrays together, without regard to which sample
is paired with which.
> Signal values below 5 (log scale) have been replaced by "NA" since they are
> potentially noises. So there are many NAs in M.
Please don't do this. limma can deal with low intensities perfectly weel,
and can figure out whether they are noise or not. You are simply removing
valid data.
If you are concerned about high variability of low intensity probes, then
look at
plotSA(fit_pair)
to examine the mean-variance trend, and use
eBayes(fit_pair, trend=TRUE)
if there is a strong trend.
> I followed the user manual and made the codes below.
>
> I think the code is correct?
Looks ok.
> My questions are (1) how to deal with NAs - as I did a search but no
> clear idea
Don't create them in the first place. This has been said many times
before!
> (2) how do people do the statistics at the gene level for one gene
> having multiple probes - averaging or taking median?
Usually we do not find it necessary to aggregate the multiple probes.
The multiple probes might represent different transcripts or variants, and
some of the probes might not work. Why do you need to take any special
action?
However, if you feel that you must, the best method to aggregate the
multiple probes is probably to retain the probe for each gene with highest
fit_pair$Amean value. We have used this strategy in a number of published
papers. The rationale for this is to choose the probe corresponding to
the most highly expressed transcript for the gene. Alternatively, you
could average the probes using the avereps() function in limma.
Best wishes
Gordon
> Thanks,
> Zhengyu
>
>
> > head(M)
> N1 N2 N3 N4 N5 N6 N7
> N8 T1 T2 T3
> 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA
> 7.876259 7.856707 NA
> T4 T5 T6 T7 T8
> 2 NA 7.720018 NA 7.752550 NA
>
>> eset<-as.matrix(M)
>> Pair=factor(targets$Pair)
>> Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched
> normal to tumors
>> design<-model.matrix(~Pair+Treat)
>> targets
> FilenName Pair Treatment
> 1 N1 1 N
> 2 N2 2 N
> 3 N3 3 N
> 4 N4 4 N
> 5 N5 5 N
> 6 N6 6 N
> 7 N7 7 N
> 8 N8 8 N
> 9 T1 1 T
> 10 T2 2 T
> 11 T3 3 T
> 12 T4 4 T
> 13 T5 5 T
> 14 T6 6 T
> 15 T7 7 T
> 16 T8 8 T
> fit_pair<-lmFit(eset,design)
> fit_pair<-eBayes(fit_pair)
>
> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list