[BioC] What is the best way to eliminate non-variants from set of arrays?

Wed Jun 27 14:54:08 CEST 2007

Hi Ruppert,

Ruppert Valentino wrote:
> Hello,
> 
> I am analysing Affymatrix microarray experiment that involve the following 
> groups :
> 
> Group      No of samples
> ---------     --------------------
> 
> A              4
> 
> B              9
> 
> C              10
> 
> D              2
> 
> 
> 
> 
> I would like to get rid of the non-variants to do unsupervised clustering. I 
> tried the simple filters like SD and fold change as in the Cluster software 
> but I always end up getting some of the technical probes like GAPDH Affy 
> coming come up and spoil the cluster. So the question is what is the best 
> algorithm to use to eliminate non-variant across the arrays non-specifically 
> i.e. without grouping them?

It seems to me that there are two questions here. First, how best to 
filter probesets agnostically, and second, why do these technical probes 
not get filtered out?

For filtering the probes, I usually prefer to filter based on variance 
(or SD if you like). This is as agnostic as you can get, and has the 
desired effect of eliminating probesets that don't change expression. 
Others seem to like using the P/M/A calls, which is another agnostic 
measure of likely signal in the data. I think both should do a 
reasonable job.

The second question is the more interesting IMO. It is always sort of 
embarrassing to give someone a list of genes where one of the top genes 
is one of the Affy control probesets. In some sense it looks like you 
weren't competent enough to 'get rid of' something that obviously 
shouldn't be there. Or should it?

Having a control probeset show up as significant doesn't necessarily 
mean that something went wrong in the filtering step. For instance, 
GAPDH is widely considered to be a housekeeping gene, but if you were 
comparing samples that had widely different levels of glycolysis, you 
might actually expect a difference in the expression of this gene.

Of course, this only applies to the control probesets that interrogate a 
gene that actually exists in the species you are working with. If say, 
BioB were differentially expressed, then you might have a technical 
problem with the way the chips were run that you might want to explore.

Anyway, rather than simply trying to get rid of something you think 
shouldn't be there, you might think about why it isn't going away when 
you filter, and think about what that might mean for this particular 
experiment.

Best,

Jim

> 
> I was thinking of using dChip or eBayes but any suggestion/advice would be 
> greatly appreciated as the sample size is small here and the idea is to just 
> to eliminate non-variant genes to see if the unsupervised clustering brings 
> anything.
> 
> Regards
> 
> Ruppert
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.