[BioC] Duplicated probesets for the same gene
Saroj Mohapatra
smohapat at vbi.vt.edu
Tue Apr 25 15:55:52 CEST 2006
Thanks to Sean, Bin, David and Jim, I have now a much better
understanding of the issues.
I am going to try the re-mapped cdfs.
Sincerely,
Saroj
James W. MacDonald wrote:
> As Sean mentioned, there are possibly many reasons for multiple
> probesets. First, they may be intended to interrogate splice variants.
> Second, these probesets are based on UniGene build 95, which is very old
> (the current build is #190), and many ESTs or Riken genes may have been
> mapped in the intervening period to genes that already existed on the chip.
>
> In addition, many of the probesets contain probes that are now known to
> either interrogate unrelated sequences or not map to any known sequence.
>
> You can now download the re-mapped cdfs that are provided by the
> Molecular and Behavioral Neuroscience Institute (MBNI) at the University
> of Michigan directly from BioC. These cdfs contain probesets that have
> been re-mapped based on the current UniGene, Ensembl, Entrez Gene,
> RefSeq, or Tigr annotations. The benefits of using these cdfs are
> twofold. First, there is only one probeset per gene (may not be true of
> RefSeq - I think there may be some redundancy there, but am not sure).
> Second, any probe that interrogates multiple transcripts or no longer
> maps to the genome have been removed, so theoretically you should get
> better data.
>
> The major downside (for me at least) is the loss of the easy preprocess
> ==> analyze ==> annotate pipeline provided by the affy, limma, and
> annaffy packages. However, Steffen Durinck has kindly modified his
> biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==>
> annotate analysis pipeline. Anybody interested in such things can take a
> look at the prettyOutput vignette in biomaRt.
>
> Best,
>
> Jim
>
>
>
>
More information about the Bioconductor
mailing list