[BioC] Missing probesets ~ commentary from the cheap seats ...
Richard Finney
rfinney5 at yahoo.com
Thu May 12 21:54:23 CEST 2005
note: cutz'n'snips are from previous paye/gentleman
emails ...
> > be ok but when i read my cel files I realise that
> about 20% of my
> > probesets are missing.
Which expression algorihtm are you using?
MAS5 chops off high/low (i.e. two) probes for
each probeset; so probesets with less than 3 probes
will disappear. The 20% multigene figure for an Affy
chip figure sounds about right to me.
> >
> > While trying to track the bug it seems that the
> missing probesets are
> > those whose probes are completely/partially
> included in another
>
> Hi,
> I am not sure I understand what you are saying
> here. Are you saying
> that some probes on your chip map to multiple
> mRNA's? That does seem a
> bit peculiar
It's not uncommon for probes to map to multiple genes;
often to members of the same gene family.
Additionally there are more and more wacky run-on rnas
appearing in genbank which are polluting gene to probe
mappings (check out the Acembly gene track in UCSC
browser for how confusing it can get).
> that some form of very large EM
> iteration will be
Hey, what's EM ?
> needed, but I would not expect anything to do it
> correctly off of the
> shelf.
> If this supposition is correct you now need to
> deal explicitly with
> cross-hybridization - how much of the signal at each
> such probe do you
> attribute to the *different* underlying mRNA
> species. This is doable,
> provided there is no complete confounding - or
> stated differently
> provided each mRNA species has at least one probe
> that is unique to it
Unique? Probably not good enough. You'll still get
cross hybridization from closely similar mRNAs from
other genes.
> [probably - there are sure to be more specific
> conditions since this is
> going to become a large optimization problem] - but
> it is not simple to
> do.
>
> Duplicating the probes, to give the appearance of
> every gene having a
> complete probe set will bias your results.
Agreed. Better to mask them out or lump them
into a "gene family" probeset if practical.
> >
> > My question : what can I do to reanalyse my data
> with the entire
> > probesets.
> >
I think Malick has a good question. You can
assign probes to mulitple probesets, right?
It's not that big a deal from a technical point of
view.
Of course, if you do, be sure to mark them as
cross hybridizing and you may wish to use other
technologies to verify expression of the gene in
question. To repeat: masking out Multigene probes
is probably a good thing.
More information about the Bioconductor
mailing list