[R] Changing the binning of collected data

Wed Apr 22 13:44:16 CEST 2009

Lorenzo Isella wrote:
> Dear All,
> Apologies if this is too simple for this list.
> Let us assume that you have an instrument measuring particle distributions.
> The output is a set of counts {n_i} corresponding to a set of average
> sizes {d_i}.
> The set of {d_i} ranges from d_i_min to d_i_max either linearly of
> logarithmically.
> There is no access to further detailed information about the
> distribution of the measured sizes, but at least you know enough to
> plot n(d_i) (number of counts as a function of particle size).
> If you can fit the {n_i} to a known distribution (e.g. normal or
> lognormal), then you can choose a new set of average sizes, {D_i} and
> plot the corresponding n_i(D_i).
> But what if the initial {n_i}'s observations do not belong to a known
> distribution and you still want to calculate n(D_i)?
> On the top of my head, I think that whatever I do must conserve the
> original total number of observations N=\sum_i{n_i}, but this does not
> terribly constrain the problem.
> Any suggestion is welcome.
>   
Hi Lorenzo,
You should probably be aware that both the position and spacing of 
category boundaries can have a large effect on parameter location tests 
carried out on the categorized data. See:

Wainer, H., Geseroli, M. & Verdi, M. (2006) Finding what is not there 
through the unfortunate binning of results: The Mendel effect. 
Chance,19(1): 49-52.

Lemon, J. On the perils of categorizing responses. Tutorials in 
Quantitative Methods for Psychology, 5(1): 35-39.

Jim