[BioC] Duplicated probesets for the same gene
James W. MacDonald
jmacdon at med.umich.edu
Mon Apr 24 15:24:58 CEST 2006
As Sean mentioned, there are possibly many reasons for multiple
probesets. First, they may be intended to interrogate splice variants.
Second, these probesets are based on UniGene build 95, which is very old
(the current build is #190), and many ESTs or Riken genes may have been
mapped in the intervening period to genes that already existed on the chip.
In addition, many of the probesets contain probes that are now known to
either interrogate unrelated sequences or not map to any known sequence.
You can now download the re-mapped cdfs that are provided by the
Molecular and Behavioral Neuroscience Institute (MBNI) at the University
of Michigan directly from BioC. These cdfs contain probesets that have
been re-mapped based on the current UniGene, Ensembl, Entrez Gene,
RefSeq, or Tigr annotations. The benefits of using these cdfs are
twofold. First, there is only one probeset per gene (may not be true of
RefSeq - I think there may be some redundancy there, but am not sure).
Second, any probe that interrogates multiple transcripts or no longer
maps to the genome have been removed, so theoretically you should get
better data.
The major downside (for me at least) is the loss of the easy preprocess
==> analyze ==> annotate pipeline provided by the affy, limma, and
annaffy packages. However, Steffen Durinck has kindly modified his
biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==>
annotate analysis pipeline. Anybody interested in such things can take a
look at the prettyOutput vignette in biomaRt.
Best,
Jim
Ye, Bin wrote:
> Hi, Saroj,
>
> How have you been? As far as I know, the different probe sets are
> corresponding to different region of the gene, I don't know why Affy
> do this, probably they originally thought the probe sets for the same
> gene but different region will serve just like a "probe sets sets", a
> 2nd-layer confirmation of the gene expression, but it turned out
> sometimes the different probe sets of same gene express differently
> too. Sometimes it's because the probe sets are not all hybridize to
> the coding region of the gene, so when we do our analysis, we only
> consider the expression of the coding region probe sets, which, of
> course, take some "Blast".
>
> Hope other experts can give better ideas about this!
>
>
> Bin
>
>
> -----Original Message----- From:
> bioconductor-bounces at stat.math.ethz.ch on behalf of Saroj Mohapatra
> Sent: Sun 4/23/2006 6:02 PM To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Duplicated probesets for the same gene
>
> Hi all,
>
> I have a small curiosity regarding annotation of probesets in affy
> GeneChips. I find that some times 2 probe sets refer to the same
> gene.
>
> For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
> 35372_r_at) both point to the same gene IL8. I wonder what is the
> scientific reason for such a duplication?
>
> I understand that the signal from 2 probesets would be affected by
> dye-labeling effect and hybridization effect in addition to mRNA
> abundance. What is then the point of having 2 probe sets which might
> give different results for the same gene?
>
> Please send any pointers/references that you find appropriate.
>
> Thanks for your consideration.
>
> With thanks,
>
> Saroj
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
More information about the Bioconductor
mailing list