[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene

Tue Jul 9 03:07:51 CEST 2013

Hi Zhengyu,

for turning probe-level data into gene-level data, you may want to
look at the function collapseRows in the WGCNA package (on CRAN). The
relevant citation is

Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR,
Horvath S (2011) Strategies for aggregating gene expression data: The
collapseRows R function. BMC Bioinformatics12:322. PMID: 21816037,
PMCID: PMC3166942 http://www.biomedcentral.com/1471-2105/12/322

As for dealing with the missing data, well, I personally would try to
go back and restore the original values unless they are outliers. Even
knowing that something is below 5 is better than a missing datum. To
remove noise, I would possibly remove probes whose expression
consistently remains below 5 but not individual expression values. For
example, if a probe is consistently above 7 in cancer samples and
below 5 (unexpressed) in normals, you want to identify it - but you
will completely miss it if you turn all values below 5 into NA.

Hope this helps,

Peter

On Mon, Jul 8, 2013 at 5:40 PM, zhengyu jiang <zhyjiang2006 at gmail.com> wrote:
> Dear Bioconductor experts,
>
> We have data from a homemade one-channel microarray that I tried to apply
> limma for differential expression analysis between matched paired Normal
> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate
> has been averaged after normalization). All samples are formatted in one
> matrix (M).
>
> Signals have been quantile normalized between each paired normal and tumor.
> Signal values below 5 (log scale) have been replaced by "NA" since they are
> potentially noises. So there are many NAs in M.
>
> I followed the user manual and made the codes below.
>
> I think the code is correct? My questions are (1) how to deal with NAs - as
> I did a search but no clear idea (2) how do people do the statistics at the
> gene level for one gene having multiple probes - averaging or taking median?
>
> Thanks,
> Zhengyu
>
>
>  > head(M)
>          N1       N2       N3       N4       N5       N6       N7
> N8       T1        T2       T3
> 2  8.622724 7.423568       NA       NA 7.487174       NA 8.516293       NA
> 7.876259  7.856707       NA
>          T4       T5       T6       T7       T8
> 2        NA 7.720018       NA 7.752550       NA
>
>> eset<-as.matrix(M)
>> Pair=factor(targets$Pair)
>>     Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched
> normal to tumors
>>               design<-model.matrix(~Pair+Treat)
>> targets
>    FilenName Pair Treatment
> 1         N1    1         N
> 2         N2    2         N
> 3         N3    3         N
> 4         N4    4         N
> 5         N5    5         N
> 6         N6    6         N
> 7         N7    7         N
> 8         N8    8         N
> 9         T1    1         T
> 10        T2    2         T
> 11        T3    3         T
> 12        T4    4         T
> 13        T5    5         T
> 14        T6    6         T
> 15        T7    7         T
> 16        T8    8         T
> fit_pair<-lmFit(eset,design)
>              fit_pair<-eBayes(fit_pair)
>
>  R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor