[R] Need ideas on how to show spikes in my data and how to code it in R

Daniel Folkinshteyn dfolkins at gmail.com
Mon Jun 23 22:10:02 CEST 2008


on 06/23/2008 03:40 PM Thomas Frööjd said the following:
> 1.       Shift the mean and std on the reference dataset to the mean
> and std of my clinic birth weight data.

to shift the mean by any distance, just add or subtract that distance 
from each observation (e.g., to move mean from m1 to m2, to each 
observation add (m2 -m1) ).

to shift the stddev, from, say, s1 to s2, multiply each observation by s2/s1

instead of shifting ref dataset to mean/sdev of other dataset, it might 
be more intuitive to transform both to mean=0, sdev=1.

> 2.       Scale the data so they can be plotted on the same axis. The
> reference dataset has around 20 000 observations and my data from the
> clinic only around 3000 so I have to fix this otherwise the plot of
> the reference datset will be much bigger in the graph.

if you do a density plot (see ?density in R), it will automatically be 
scaled. if you want the histogram scaled too, then after calculating the 
histogram frequencies, multiply them by a ratio of numberofobs for your 
data, and number of obs for reference data (i.e.: NOBS_yourdata / 
NOSB_refdata)

but i'd say, you might do better to just work with a density plot and 
set the appropriate bandwidth parameter, rather than working with a 
histogram, for presentational purposes.

> 3.       Plot both on the same graph. The reference dataset like a
> density plot and my dataset as a histogram, that means weight bins on
> the x axis and number of observations on y. It should be added that my
> reference dataset isn't truly continuous but recorded at 100g
> intervals. This means both datasets have the same grouping however
> plotting both as histogram would probably make it harder to understand
> for a person with little training in statistics. This means that the
> reference dataset "density function" has to be smoothed somehow.

see ?density, set the appropriate bandwidth parameter to achieve your 
desired degree of smoothing.

> I would be very thankful for help on any of those steps. Also if you
> think this approach is wrong for some reason please tell me.

i think you'd have a much easier time of it (and also a better-looking 
and more informative plot), if you plot both as density on the same 
plot, and forgo the histogram overlay. your reference dataset will be a 
nice smooth histogram, as long as you choose a wide enough bandwidth to 
avoid showing peaks every 100g, and your target dataset will have large 
peaks at 2, 2.5, 3, etc. will look very nice salient. :)

also, unless the babies in both plots are from different species :), you 
probably don't need to transform the data to equalize means and variances.

-d



More information about the R-help mailing list