[BioC] Affymetrix Intronic Normalization Control Probes Differentially Expressed?

Mon Mar 19 14:59:06 CET 2012

Hi Alexandra,

On 3/18/2012 11:36 PM, Alexandra Muñoz wrote:
> Hi Jim et. al,
>
> I am attempting to analyze data from humans exposed to arsenic in vivo from
> an affymetrix Human Gene 1.0 ST Array. I have generated differentially
> expressed genes lists using LIMMA and an ANOVA.

In vivo arsenic exposure? Wow.

>
> I am encountering a high number of control probes as top genes in both
> lists and am not sure if I should be ignoring this information, removing
> it, or utilizing it. I found an earlier post which seemed to be related and
> which directed the user to identify the type of probe in order to determine
> if its differential expression may have been an error resulting from batch
> effects (
> http://article.gmane.org/gmane.science.biology.informatics.conductor/28952/match=control+probes)
> though based on the category of my probes I'm not sure how to proceed.
> NetAffx online tool my control probes fall into the category of  "intronic
> normalization control".     It doesn't make sense to me that they would be
> in the top genes list, and I would appreciate any help as to how to
> interpret their presence and if necessary about how to remove them from the
> analysis prior to the list generation.
>
> Here is an example of some the probe numbers I am getting
>    7892503  7892505  7892551  7892558  7892571  7892581  7892589  7892633
> 7892675  7892676  7892689  7892729  7892738  7892753  7892757  7892788

You might want to talk with the folks who processed the samples and 
arrays to see if there is anything that might explain this. I would 
normally not worry if there were just a few control probes in the 
differential gene list, but if there are lots of them it may indicate 
some technical artifact that isn't being controlled for by the 
normalization procedure.

Although I wouldn't advocate simply removing them without figuring out 
why they are showing up, it is easy to do. The pd.hugene.1.0.st.v1 
package has the information you need:

 > library(pd.hugene.1.0.st.v1)
 > con <- db(pd.hugene.1.0.st.v1)
 > dbGetQuery(con, "select * from type_dict;")
1    1                      main
2    2             control->affx
3    3             control->chip
4    4 control->bgp->antigenomic
5    5     control->bgp->genomic
6    6            normgene->exon
7    7          normgene->intron
8    8  rescue->FLmRNA->unmapped

so you just want the 'main' probes. You can get the probeset IDs from 
the featureSet table.

 > mains <- dbGetQuery(con, "select fsetid from featureSet where 
type='1';")[,1]

Then you can subset out any non-main probesets using this vector, and 
e.g., the %in% function.

Best,

Jim

>
> Thank you,
> Alexandra Munoz
> NYU PhD Student - Molecular and genetic toxicology
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099