[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene
Peter Langfelder
peter.langfelder at gmail.com
Tue Jul 9 04:16:33 CEST 2013
[CCing the list as well]
On Mon, Jul 8, 2013 at 6:20 PM, zhengyu jiang <zhyjiang2006 at gmail.com> wrote:
> Hi Peter,
> Thank you very much for your reply. That's a very good point.
> I am just layman to limma and microarray analysis & try to learn. You are
> very helpful.
> Can I ask you two more questions?
>
> My data has a huge dynamic range - raw intensity from about 500 to 93760. I
> decided the cutoff for background just based on literature - 99%. Is there
> any better way to decide the outliers? Is there any better way to do the
> normalization other than quantile for big dynamic range?
>
I am not an expert on microarray pre-processing; quantile
normalization is what I usually do but I do check whether it's really
necessary. As for identifying outlier samples, we typically use the
sample network approach of Oldham MC, Langfelder P, Horvath S (2012)
Network methods for describing sample relationships in genomic
datasets: application to Huntington's disease. BMC Syst Biol. 2012 Jun
12;6(1):63. PMID: 22691535 46(11) 1-17
http://www.biomedcentral.com/1752-0509/6/63/abstract (sorry for
constantly advertising work I was involved in :)). This web page:
http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/SampleNetwork/
contains the necessary code and a tutorial with sample data.
Peter
> Best,
> Zhengyu
>
>
>
> On Mon, Jul 8, 2013 at 9:07 PM, Peter Langfelder
> <peter.langfelder at gmail.com> wrote:
>>
>> Hi Zhengyu,
>>
>> for turning probe-level data into gene-level data, you may want to
>> look at the function collapseRows in the WGCNA package (on CRAN). The
>> relevant citation is
>>
>> Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR,
>> Horvath S (2011) Strategies for aggregating gene expression data: The
>> collapseRows R function. BMC Bioinformatics12:322. PMID: 21816037,
>> PMCID: PMC3166942 http://www.biomedcentral.com/1471-2105/12/322
>>
>> As for dealing with the missing data, well, I personally would try to
>> go back and restore the original values unless they are outliers. Even
>> knowing that something is below 5 is better than a missing datum. To
>> remove noise, I would possibly remove probes whose expression
>> consistently remains below 5 but not individual expression values. For
>> example, if a probe is consistently above 7 in cancer samples and
>> below 5 (unexpressed) in normals, you want to identify it - but you
>> will completely miss it if you turn all values below 5 into NA.
>>
>> Hope this helps,
>>
>> Peter
>>
>> On Mon, Jul 8, 2013 at 5:40 PM, zhengyu jiang <zhyjiang2006 at gmail.com>
>> wrote:
>> > Dear Bioconductor experts,
>> >
>> > We have data from a homemade one-channel microarray that I tried to
>> > apply
>> > limma for differential expression analysis between matched paired Normal
>> > (N) and Tumor (Tumor) samples - 8 biological replicates (one tech
>> > replicate
>> > has been averaged after normalization). All samples are formatted in one
>> > matrix (M).
>> >
>> > Signals have been quantile normalized between each paired normal and
>> > tumor.
>> > Signal values below 5 (log scale) have been replaced by "NA" since they
>> > are
>> > potentially noises. So there are many NAs in M.
>> >
>> > I followed the user manual and made the codes below.
>> >
>> > I think the code is correct? My questions are (1) how to deal with NAs -
>> > as
>> > I did a search but no clear idea (2) how do people do the statistics at
>> > the
>> > gene level for one gene having multiple probes - averaging or taking
>> > median?
>> >
>> > Thanks,
>> > Zhengyu
>> >
>> >
>> > > head(M)
>> > N1 N2 N3 N4 N5 N6 N7
>> > N8 T1 T2 T3
>> > 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293
>> > NA
>> > 7.876259 7.856707 NA
>> > T4 T5 T6 T7 T8
>> > 2 NA 7.720018 NA 7.752550 NA
>> >
>> >> eset<-as.matrix(M)
>> >> Pair=factor(targets$Pair)
>> >> Treat=factor(targets$Treatment,levels=c("N","T")) # compared
>> >> matched
>> > normal to tumors
>> >> design<-model.matrix(~Pair+Treat)
>> >> targets
>> > FilenName Pair Treatment
>> > 1 N1 1 N
>> > 2 N2 2 N
>> > 3 N3 3 N
>> > 4 N4 4 N
>> > 5 N5 5 N
>> > 6 N6 6 N
>> > 7 N7 7 N
>> > 8 N8 8 N
>> > 9 T1 1 T
>> > 10 T2 2 T
>> > 11 T3 3 T
>> > 12 T4 4 T
>> > 13 T5 5 T
>> > 14 T6 6 T
>> > 15 T7 7 T
>> > 16 T8 8 T
>> > fit_pair<-lmFit(eset,design)
>> > fit_pair<-eBayes(fit_pair)
>> >
>> > R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display
>> > top 30
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list