[R] Finding non-normal distributions per row of data frame?

Peter Ehlers ehlers at ucalgary.ca
Fri Feb 4 20:25:21 CET 2011


On 2011-02-04 11:00, DB1984 wrote:
>
> Hi Greg,
>
> In addition to the reply above, to address your questions - I fully
> appreciate that my understanding of the code is basic - this is my first
> attempt at putting this together...
>
> My starting point is a data frame with numeric and text columns, but I can
> cut columns to make a fully numeric matrix if that is easier to handle.
>
> "apply(y, 1, shapiro.test)" works for a second dataframe, yes. I guess that
> I chose a bad example dataset for 'nt'!
>
>
> The overall aim is to test the normality of the distribution of the values
> in each row. I would then subset out the non-normal distributions to
> interrogate further. The shapiro.test seems a simple first pass at this. I'd
> like to move on to plotting residuals of a QQplot next, to see if that is
> more or less sensitive at detecting non-normal distributions in the dataset.
>
> If you would recommend an alternative approach, I'd appreciate the input,
> thanks..

I don't know what your overall scientific aim is, but here's
something to ponder:

Suppose that you randomly sample 400,000 observations from a
NORMAL distribution and put these into a matrix of 20,000
rows by 20 columns and then perform your row-wise Normality
tests, storing the p-values.

If you now select those rows with p-value < 0.05, you will
get about ..... many rows. (Fill in your best guess.)

Question: what does that imply for your scientific analysis?

Answer: Normality testing may not be your best line of attack.

Peter Ehlers



More information about the R-help mailing list