[BioC] RMA probe summarization when sizes of probesets are unequal
James W. MacDonald
jmacdon at uw.edu
Fri Aug 16 18:25:50 CEST 2013
Hi Xin,
On 8/16/2013 11:50 AM, Xin Lin [guest] wrote:
> Dear all,
>
> I have a customized two-channel microarray designed by NimbleGen (one of the last they produced) based on 26981 cDNA sequences of tomato. 60-mer oligonucleotide probes were designed and multiple probes were used for each transcript. The problem is, the sizes of probe-sets are not equal -- ranging from 5 to 1 (26784 of them have 5 probes).
>
> I am now trying to use rma() in oligo for normalization and summarization. My question is, in the situation where sizes of probe-sets are unequal, how does rma() do the probe summarization? Will it be problematic if I use rma() directly? If rma() could not do the correct job, what alternative method can I use for probe summarization?
The RMA algorithm will have no problem with different sized probesets.
The number of probes per probeset has varied pretty much from the first
Affy array, and continues to this day, so this has never been an issue.
You could make the argument that the reliability of the summary
statistic that is generated by rma() is dependent on the number of
probes that went into the summarization. Certainly it is true in a
statistical sense, but you could argue that five poorly-performing
probes won't give a better estimate of the level of a transcript than a
single well-performing probe, so it is hard to make a blanket statement
about an entire array. But all things equal, rma() will give a better
estimate from five probes than from a single probe.
But what people tend not to worry about is the fact that we don't
usually take that into consideration for downstream analyses. In other
words, if you have two probesets, one that has 5 probes, and one that
has a single probe, then the accuracy of the summarized expression value
may be better for the first one than the second. If you then compute
t-statistics using these two probesets, you don't take into account that
the first probeset is likely to more accurately measure the underlying
expression of the gene than the second.
The puma package is designed to account for the variable uncertainty (I
must confess that I have never actually used it however), so if you are
concerned about this sort of thing, then you might look at that package.
Best,
Jim
>
> I am new in microarray and R, and I'll appreciate your help very much! Thank you for your time!
>
> Xin
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] qvalue_1.34.0 affyPLM_1.36.0 preprocessCore_1.22.0
> [4] gcrma_2.32.0 affy_1.38.1 pd.121114.slycop.tm.exp_0.0.1
> [7] pdInfoBuilder_1.24.0 affxparser_1.32.3 oligo_1.24.0
> [10] oligoClasses_1.22.0 geneplotter_1.38.0 lattice_0.20-15
> [13] annotate_1.38.0 AnnotationDbi_1.22.6 Biobase_2.20.1
> [16] BiocGenerics_0.6.0 RColorBrewer_1.0-5 limma_3.16.6
> [19] genefilter_1.42.0 RSQLite_0.11.4 DBI_0.2-7
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 BiocInstaller_1.10.2 Biostrings_2.28.0 bit_1.1-10
> [5] codetools_0.2-8 ff_2.2-11 foreach_1.4.1 GenomicRanges_1.12.4
> [9] grid_3.0.1 IRanges_1.18.2 iterators_1.0.6 splines_3.0.1
> [13] stats4_3.0.1 survival_2.37-4 tcltk_3.0.1 tools_3.0.1
> [17] XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list