[BioC] Affymetrix Intronic Normalization Control Probes Differentially Expressed?
James W. MacDonald
jmacdon at uw.edu
Mon Mar 19 14:59:06 CET 2012
Hi Alexandra,
On 3/18/2012 11:36 PM, Alexandra Muñoz wrote:
> Hi Jim et. al,
>
> I am attempting to analyze data from humans exposed to arsenic in vivo from
> an affymetrix Human Gene 1.0 ST Array. I have generated differentially
> expressed genes lists using LIMMA and an ANOVA.
In vivo arsenic exposure? Wow.
>
> I am encountering a high number of control probes as top genes in both
> lists and am not sure if I should be ignoring this information, removing
> it, or utilizing it. I found an earlier post which seemed to be related and
> which directed the user to identify the type of probe in order to determine
> if its differential expression may have been an error resulting from batch
> effects (
> http://article.gmane.org/gmane.science.biology.informatics.conductor/28952/match=control+probes)
> though based on the category of my probes I'm not sure how to proceed.
> NetAffx online tool my control probes fall into the category of "intronic
> normalization control". It doesn't make sense to me that they would be
> in the top genes list, and I would appreciate any help as to how to
> interpret their presence and if necessary about how to remove them from the
> analysis prior to the list generation.
>
> Here is an example of some the probe numbers I am getting
> 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633
> 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788
You might want to talk with the folks who processed the samples and
arrays to see if there is anything that might explain this. I would
normally not worry if there were just a few control probes in the
differential gene list, but if there are lots of them it may indicate
some technical artifact that isn't being controlled for by the
normalization procedure.
Although I wouldn't advocate simply removing them without figuring out
why they are showing up, it is easy to do. The pd.hugene.1.0.st.v1
package has the information you need:
> library(pd.hugene.1.0.st.v1)
> con <- db(pd.hugene.1.0.st.v1)
> dbGetQuery(con, "select * from type_dict;")
1 1 main
2 2 control->affx
3 3 control->chip
4 4 control->bgp->antigenomic
5 5 control->bgp->genomic
6 6 normgene->exon
7 7 normgene->intron
8 8 rescue->FLmRNA->unmapped
so you just want the 'main' probes. You can get the probeset IDs from
the featureSet table.
> mains <- dbGetQuery(con, "select fsetid from featureSet where
type='1';")[,1]
Then you can subset out any non-main probesets using this vector, and
e.g., the %in% function.
Best,
Jim
>
> Thank you,
> Alexandra Munoz
> NYU PhD Student - Molecular and genetic toxicology
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list