[BioC] What to do with multiple probes?

Robert Gentleman rgentlem at fhcrc.org
Mon Nov 28 19:33:49 CET 2005


Hi,
  Sean has already answered some of your questions, but I will provide a 
few of my thoughts on this.

  1) there is little discussion because it is a reasonably difficult 
topic and there is not clear cut answers, besides "it depends" and it 
does depend on a lot of different things.

  For example, you might exclude the information on some probe sets if 
they are far from the poly-A tail and dT-priming was used, if random 
priming was used, then all should be equally good (but I am not aware of 
a comprehensive comparison).

  In some cases, depending on the data, processing etc, you can develp 
tools for comparing duplicate probe sets and combining the information 
to get better estimates for whether genes are expressed and at what 
levels (you could compare the probes and see if they are unique in the 
genome, for example). In these situations, using R is a good thing, 
since you can pretty much do any reasonable analysis, but you need to 
know some statistics and some programming to do it, and there is no 
clear recipe to follow.

krasikov at science.uva.nl wrote:
> Dear all,
> 
> 1.
> I have a general question about the multiple probes for each gene.
> This question has been discussed several times by BioC community,
> but I didn't find any clear solution.
> 
> My array platform is bacterial Custom Agilent oligo microarray.
> It consists of 8000 unique probes for bit more than 3000 genes (complete 
> bacterial genome) with 1, 2 or 3 probes per gene (mostly depending on 
> the length of the gene: 1 for short and 3 for long ones).
> 
> The generated list contains statistics for each probe.
> What should I do to generate the gene list (which is normally needed for 
> the biology related research)?
> It's fine when the gene is decided to be regulated for all three probes
> in the same direction, but what to do if not?
> Should I exclude such genes from final list?
> May anybody give me a clue how to deal with that?
> 
> 2. This  is for a while my particular solution,
> which is maybe far too strict.
> 
> My list contain the info like this
> (result of the write.fit):
> (for three probes for the same gene)
> A	M	p	Result	Probename
> *	*	*	1	xxx1111_123
> *	*	*	0	xxx1111_566
> *	*	*	1	xxx1111_1050
> 
> How to arrange it in elegant way:
> A.mean	M.mean	New.Result xxx1111	M.1	M.2	M.3	p.1	p.2	p.3 ?
> where A.mean and M.mean are means of all probes for that gene
> and a new Result is logical (something like all three 1 then 1,
> all three -1 then -1, if at least one zero or opposite than 0)
> 
> 3.
> For my experiment (in a strictly controlled conditions, with 5 
> biological replicates and some dye-swaps for them) from my
> 8000 probes 3500 diceded to be regulated, which is almost half of 
> complete set (big part of the decisions is biologically relevant,
> which is nice).

  Do you mean about 3500 are showing differential expression? This seems 
very large, and you do realize that it violates most of the principles 
that underly the usual normalization procedures? That may be more of a 
problem for you than the duplicate probes. And fixing it, or convincing 
yourself that the outputs of the normalization are ok, will take some 
time and statistical expertise. In my experience these are way outside 
of what can easily be dealt with on a mailing list - local expertise is 
what is needed.

  Best wishes,
    Robert


> Is not it to much? (I'm thinking about the statistical assumption that
> most of genes should be not changed) However physiologically my 
> experiment should produce rather big differential expression.
> 
> I used direct ratio design, loess and than aquantile normalization,
> with BH correction in decideTests and p-value cut-off 0.001.
> 
> Thanks in advance for any help.
> Vladimir
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list