[BioC] checking multi-modalities in histograms

Kevin Coombes kevin.r.coombes at gmail.com
Tue Mar 30 21:34:57 CEST 2010


Raw? Log-transformed?  It almost certainly matters. The underlying model 
used by bimodalIndex (and by Mclust) is that of a mixture of normal 
distributions.  On the raw-linear scale of most microarrays, the 
distributions are skewed.  (In fact, the usual RMA background correction 
model is to view the data as a mixture of normal background with 
exponential noise.)  I would expect that fitting a mixture-of-normals 
model to this data would almost always conclude that it was (at least) 
bimodal, with the long exponential tail representing one of the modes.

    Kevin

Javier Pérez Florido wrote:
> Dear Wolfgang,
> Thanks for your reply.
> The data I am going to test for bi-modalities are raw data, without 
> preprocessing. For this purpose I think it is ideal to use 
> bimodalIndex function from ClassDiscovery package. It tests for 
> bimodalities using the information-based BIC criterion.
> I know that there are more quality metrics such as boxplots, MA plots, 
> NUSE, etc...The use of histograms is complementary to all of them and 
> all I need is something that says that, maybe, a CEL file isn't good 
> due to such bi-modalities, taking into account the rest of quality 
> metrics.
>
> Thanks again,
> Javier
>
>
>
> On 30/03/2010 14:54, Wolfgang Huber wrote:
>> Dear Javier
>>
>> note that the number of modes of a distribution
>> - can depend on the normalisation (before or after 
>> log-transformation; or whether background correction was done and how)
>> - is impossible to determine from a finite sample without further 
>> assumptions (essentially a smoothing bandwidth)
>>
>> Besides these (significant) practical difficulties, I am also 
>> doubtfulof the usefulness, in terms of sensitivity and specificity, 
>> of this criterion for array quality diagnostics. If you see two 
>> modes, they would most likely be associated with a covariate, such as 
>> row,  column, spatial position on the array. Then, if you find that 
>> this co-variate is quality-relevant, then I would advise checking for 
>> significant effects of that covariate even on arrays where the 
>> distribution looks uni-modal.
>>
>>         Best wishes
>>             Wolfgang
>>
>> Mar 29, 2010, alle ore 6:14 PM, Javier Pérez Florido
>>
>>   
>>> Dear list,
>>> Histograms are usually used to check the quality of microarray
>>> experiments. If there are bi-modalities in a particular array, it is a
>>> candidate to exclude it from the experiment. It is easy to check
>>> bi-modalities or multi-modalities visually, but I would like to know if
>>> there is a way (using a statistical test or something) to check
>>> multi-modalities using the data returned by the hist function.
>>>
>>> For an Affybatch object, hist function returns the X and Y values, but
>>> that's all, it doesn't return the variables breaks, counts, etc as 
>>> it is
>>> said in the help manual for hist. So, I have two questions:
>>>
>>>     * Is there a test to check for multi-modalities in histograms?
>>>     * Is there a way to know the cells and the number of values per 
>>> cell
>>>       used by hist to check for multi-modalities in a rudimentary way?
>>>
>>> Thanks again,
>>> Javier
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>      
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list