[BioC] What to do with multiple probes?

Mon Nov 28 13:26:12 CET 2005

On 11/25/05 9:32 AM, "krasikov at science.uva.nl" <krasikov at science.uva.nl>
wrote:

> Dear all,
> 
> 1.
> I have a general question about the multiple probes for each gene.
> This question has been discussed several times by BioC community,
> but I didn't find any clear solution.
> 
> My array platform is bacterial Custom Agilent oligo microarray.
> It consists of 8000 unique probes for bit more than 3000 genes (complete
> bacterial genome) with 1, 2 or 3 probes per gene (mostly depending on
> the length of the gene: 1 for short and 3 for long ones).
> 
> The generated list contains statistics for each probe.
> What should I do to generate the gene list (which is normally needed for
> the biology related research)?
> It's fine when the gene is decided to be regulated for all three probes
> in the same direction, but what to do if not?
> Should I exclude such genes from final list?
> May anybody give me a clue how to deal with that?

Unfortunately, if all quality metrics are the same (no reason to choose one
probe over the other), then validation is in order using another platform
for gene expression (PCR, another array, etc.).

Another possibility is to go back and blast all probes against some
transcript database (like refseq) to get some sense of cross-hybridization
potential, mismatch (if there is any), alignment to transcript variants, and
3'-bias.  In some cases, it may be clear that one probe represents one
transcript and another probe represents a different transcript, each of
which is expressed in a different tissue, for example.  (I have to say that
such situations are rare, though.)

> 2. This  is for a while my particular solution,
> which is maybe far too strict.
> 
> My list contain the info like this
> (result of the write.fit):
> (for three probes for the same gene)
> A    M    p    Result    Probename
> *    *    *    1    xxx1111_123
> *    *    *    0    xxx1111_566
> *    *    *    1    xxx1111_1050
> 
> How to arrange it in elegant way:
> A.mean    M.mean    New.Result xxx1111    M.1    M.2    M.3    p.1    p.2
p.3 ?
> where A.mean and M.mean are means of all probes for that gene
> and a new Result is logical (something like all three 1 then 1,
> all three -1 then -1, if at least one zero or opposite than 0)

I guess it depends on what you want to do with the information.  If you are
in a gene discovery mode (minimize false-negatives), you may simply list all
probes in order of significance.  If two of three probes are not
significant, that isn't a problem, as you will need to validate some
proportion of your data, anyway.

> 3.
> For my experiment (in a strictly controlled conditions, with 5
> biological replicates and some dye-swaps for them) from my
> 8000 probes 3500 diceded to be regulated, which is almost half of
> complete set (big part of the decisions is biologically relevant,
> which is nice).
> Is not it to much? (I'm thinking about the statistical assumption that
> most of genes should be not changed) However physiologically my
> experiment should produce rather big differential expression.

It is possible to have a large number of differentially-expressed genes,
yes.  

Sean