[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene
Peter Langfelder
peter.langfelder at gmail.com
Tue Jul 9 03:07:51 CEST 2013
Hi Zhengyu,
for turning probe-level data into gene-level data, you may want to
look at the function collapseRows in the WGCNA package (on CRAN). The
relevant citation is
Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR,
Horvath S (2011) Strategies for aggregating gene expression data: The
collapseRows R function. BMC Bioinformatics12:322. PMID: 21816037,
PMCID: PMC3166942 http://www.biomedcentral.com/1471-2105/12/322
As for dealing with the missing data, well, I personally would try to
go back and restore the original values unless they are outliers. Even
knowing that something is below 5 is better than a missing datum. To
remove noise, I would possibly remove probes whose expression
consistently remains below 5 but not individual expression values. For
example, if a probe is consistently above 7 in cancer samples and
below 5 (unexpressed) in normals, you want to identify it - but you
will completely miss it if you turn all values below 5 into NA.
Hope this helps,
Peter
On Mon, Jul 8, 2013 at 5:40 PM, zhengyu jiang <zhyjiang2006 at gmail.com> wrote:
> Dear Bioconductor experts,
>
> We have data from a homemade one-channel microarray that I tried to apply
> limma for differential expression analysis between matched paired Normal
> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate
> has been averaged after normalization). All samples are formatted in one
> matrix (M).
>
> Signals have been quantile normalized between each paired normal and tumor.
> Signal values below 5 (log scale) have been replaced by "NA" since they are
> potentially noises. So there are many NAs in M.
>
> I followed the user manual and made the codes below.
>
> I think the code is correct? My questions are (1) how to deal with NAs - as
> I did a search but no clear idea (2) how do people do the statistics at the
> gene level for one gene having multiple probes - averaging or taking median?
>
> Thanks,
> Zhengyu
>
>
> > head(M)
> N1 N2 N3 N4 N5 N6 N7
> N8 T1 T2 T3
> 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA
> 7.876259 7.856707 NA
> T4 T5 T6 T7 T8
> 2 NA 7.720018 NA 7.752550 NA
>
>> eset<-as.matrix(M)
>> Pair=factor(targets$Pair)
>> Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched
> normal to tumors
>> design<-model.matrix(~Pair+Treat)
>> targets
> FilenName Pair Treatment
> 1 N1 1 N
> 2 N2 2 N
> 3 N3 3 N
> 4 N4 4 N
> 5 N5 5 N
> 6 N6 6 N
> 7 N7 7 N
> 8 N8 8 N
> 9 T1 1 T
> 10 T2 2 T
> 11 T3 3 T
> 12 T4 4 T
> 13 T5 5 T
> 14 T6 6 T
> 15 T7 7 T
> 16 T8 8 T
> fit_pair<-lmFit(eset,design)
> fit_pair<-eBayes(fit_pair)
>
> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list