[BioC] averaging multiple probes for same gene on agilent array
Tobias Straub
tstraub at med.uni-muenchen.de
Thu Jul 23 09:23:01 CEST 2009
Hi Alison,
I agree that from a biologist point of view a summarization on the
gene level is very much wanted, therefore I would prefer summarize as
early as possible (before testing for differential expression). I
think, however, that the strategy will depend a bit on the rationale
of probe design: if probes are e.g. always placed on different exons
then you might expect very different Ms and the summarization is very
problematic (also from a biological point of view).
My personal way to deal with your problem on Agilent arrays is to
first filter the probes before gene summarization based on several
criteria
a) agilent spot quality criteria (whatever you have, whatever you like)
b) at present I also apply A-value cutoffs as the Ms are not reliable
under and above certain expression levels
My gene summarization is based on the assumption that the highest Ms
are the most meaningful (maybe the most 'real'), therefore I do not
calculate medians or sth similar but simply keep just the probe with
the highest median of absolute Ms across the arrays. if most of your
genes comprise 3 probes is anyway difficult to average.
if anyone has better ideas, I am looking forward to hear them!
best
Tobias
On Jul 22, 2009, at 9:00 PM, Alison Waller wrote:
> Dear Bioconductor list,
>
> I am analysing data from a custom Agilent array with 3600 spots
> using Limma.
>
> There are 3 probes for each gene (usually, however some genes only
> have one probe), all probes are in duplicate.
>
> I would like to obtain an average M value for each gene.
>
> Examples of the spot ID's are as below.
> D137-cbdb_A1587_1
> D137-cbdb_A1587_1
> D137-cbdb_A1587_2
> D137-cbdb_A1587_2
> D137-cbdb_A1587_3
> D137-cbdb_A1587_3
> D138-cbdb_A1594
> D138-cbdb_A1594
>
>
> One option I thought of was to adjust the GAL file to have identical
> IDs for all of the probes for the same gene and then use the
> avereps() function.
>
> ID Name
>
> D137 D137-cbdb_A1587_1
> D137 D137-cbdb_A1587_1
> D137 D137-cbdb_A1587_2
> D137 D137-cbdb_A1587_2
> D137 D137-cbdb_A1587_3
> D137 D137-cbdb_A1587_3
> D138 D138-cbdb_A1594
> D138 D138-cbdb_A1594
>
> However, the avereps() function seems more suitable for actual
> duplicates, for probesets I would like to use some weighted average
> where probes with intensities which are futher from the mean of the
> probe set are down weighted (for example the tukey biweight).
>
> Does anyone have experience with similar arrays or suggestions of an
> appropriate function.
>
> thank you,
>
> alison
>
> ---------------------------------------------------------
> Alison Waller Ph.D
> alison.waller at utoronto.ca
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
----------------------------------------------------------------------
Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D
More information about the Bioconductor
mailing list