[BioC] DNA micro-array normalization

James W. MacDonald jmacdon at med.umich.edu
Tue Feb 16 21:50:26 CET 2010


To add to this; these data are almost surely MAS5 processed data, as I 
don't know of any other algorithm that gives the detection p-value. In 
addition, the range of 0 - 9000 indicates that these data are not logged 
(which is the next step for you). People normally use log base 2 so that 
a difference of 1 or -1 indicates two-fold up or down regulation.

MAS5 data are normalized after the fact, so you should log transform and 
then look at plots of the densities to see if they look as if they have 
been normalized or not. The default is to do a scale normalization, so 
you are just looking for the densities to be in same general vicinity 
rather than overlaying each other.

If you could get the original celfiles, you would be much better off.

Best,

Jim



michael watson (IAH-C) wrote:
> This is definitely processed data, and without access to the original data or a description of the analysis methodology, your options are limited.
> 
> Personally, I'd do a test for normality on the "Signal" values, and if they turn out to be normal, I'd run a simple t-test (control vs treatment) on each gene and correct the p-values for multiple testing.
> 
> Simple stuff, but it should work.
> ________________________________________
> From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of avehna [avhena at gmail.com]
> Sent: 16 February 2010 19:47
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] DNA micro-array normalization
> 
> Hi There,
> 
> I've got a DNA microarray dataset that looks like this:
> 
> *    Probe                 Signal          Detection
> Detection_p-value                   Descriptions*
> AFFX-BioB-5_at       181                P
> 0.00011                  "E. coli  GEN=bioB  DB_XREF=gb:J04423.1"
> AFFX-BioB-M_at     227.3              P                 0.000044
>    "E. coli  GEN=bioB  DB_XREF=gb:J04423.1"
> AFFX-BioC-5_at     499.2               P
> 0.000052                "E. coli  GEN=bioC  DB_XREF=gb:J04423.1"
> 
> I have control and treatment with 3 replicas for each one of them.
> 
> But I'm not sure whether these data have been already normalized, and on the
> other hand, this is not the typical affymetrix format...
> 
> Could you help me in this regard? What is the typical signal range for rough
> affymetrix data? (these data range from 0 to 9000)
> 
> If the data have been already normalized, Can I calculate the mean (for
> treatment and control) followed by the differential expression of genes
> without taking into account the "Detection" column?
> 
> (I guess I will need to build my ExpressionSet from scratch)
> 
> Thanks a lot (I'm a newbie in bioconductor and micro-array analysis). I will
> appreciate you help!
> 
> Avhena
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list