[BioC] checking multi-modalities in histograms
Kevin Coombes
kevin.r.coombes at gmail.com
Tue Mar 30 21:34:57 CEST 2010
Raw? Log-transformed? It almost certainly matters. The underlying model
used by bimodalIndex (and by Mclust) is that of a mixture of normal
distributions. On the raw-linear scale of most microarrays, the
distributions are skewed. (In fact, the usual RMA background correction
model is to view the data as a mixture of normal background with
exponential noise.) I would expect that fitting a mixture-of-normals
model to this data would almost always conclude that it was (at least)
bimodal, with the long exponential tail representing one of the modes.
Kevin
Javier Pérez Florido wrote:
> Dear Wolfgang,
> Thanks for your reply.
> The data I am going to test for bi-modalities are raw data, without
> preprocessing. For this purpose I think it is ideal to use
> bimodalIndex function from ClassDiscovery package. It tests for
> bimodalities using the information-based BIC criterion.
> I know that there are more quality metrics such as boxplots, MA plots,
> NUSE, etc...The use of histograms is complementary to all of them and
> all I need is something that says that, maybe, a CEL file isn't good
> due to such bi-modalities, taking into account the rest of quality
> metrics.
>
> Thanks again,
> Javier
>
>
>
> On 30/03/2010 14:54, Wolfgang Huber wrote:
>> Dear Javier
>>
>> note that the number of modes of a distribution
>> - can depend on the normalisation (before or after
>> log-transformation; or whether background correction was done and how)
>> - is impossible to determine from a finite sample without further
>> assumptions (essentially a smoothing bandwidth)
>>
>> Besides these (significant) practical difficulties, I am also
>> doubtfulof the usefulness, in terms of sensitivity and specificity,
>> of this criterion for array quality diagnostics. If you see two
>> modes, they would most likely be associated with a covariate, such as
>> row, column, spatial position on the array. Then, if you find that
>> this co-variate is quality-relevant, then I would advise checking for
>> significant effects of that covariate even on arrays where the
>> distribution looks uni-modal.
>>
>> Best wishes
>> Wolfgang
>>
>> Mar 29, 2010, alle ore 6:14 PM, Javier Pérez Florido
>>
>>
>>> Dear list,
>>> Histograms are usually used to check the quality of microarray
>>> experiments. If there are bi-modalities in a particular array, it is a
>>> candidate to exclude it from the experiment. It is easy to check
>>> bi-modalities or multi-modalities visually, but I would like to know if
>>> there is a way (using a statistical test or something) to check
>>> multi-modalities using the data returned by the hist function.
>>>
>>> For an Affybatch object, hist function returns the X and Y values, but
>>> that's all, it doesn't return the variables breaks, counts, etc as
>>> it is
>>> said in the help manual for hist. So, I have two questions:
>>>
>>> * Is there a test to check for multi-modalities in histograms?
>>> * Is there a way to know the cells and the number of values per
>>> cell
>>> used by hist to check for multi-modalities in a rudimentary way?
>>>
>>> Thanks again,
>>> Javier
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list