[BioC] Normalization quality

Thu Oct 5 12:47:08 CEST 2006

On Thursday 05 October 2006 06:22, alex lam (RI) wrote:
> Dear BioCers,
>
> Hi! I don't have much of experience in handling affy chips and I hope
> someone can help me here.
>
> I loaded 276 affy CEL files using justGCRMA - my computer couldn't cope
> going the readAffy route.
> Quantile normalization was done.
>
> I have noticed that for some probesets there are some strange results.
>
> For example:
> > summary(exprs(eset.norm.quantile)["203329_at",])
>
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>   2.486   2.513   2.523   2.545   2.534   5.201
>
> This gene seems to be very lowly expressed for everyone except for 1
> chip. I did a boxplot of that chip against a few others after
> normalization and the overall distributions are similar.

I would look at the quality metrics for the arrays as a separate issue from 
normalization.  If you have 276 arrays, there will likely be an "outlier" for 
at least one array FOR MANY GENES.

> (1) As I don't have an AffyBatch object, is there a way to make image
> plots? And what other methods are there to catch these odd ones?
>
> (2) I was provided summarised expression values from an external group
> and they used MAS5 for pre-processing. For the same gene, the expression
>
> summary is:
> > summary(mas.expr[,"X203329_at"]) # I received a csv file, hence it's a
>
> data frame
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>   2.926   5.337   5.950   5.880   6.403   9.183
> The variation seems to be much greater, and it looks much more
> interesting. 

I think that for most purposes, most people would agree that MAS5 is inferior 
to RMA or GCRMA, but such a statement is a dangerous one to make without 
knowing the details of the experiment.

Sean