[R] Finding non-normal distributions per row of data frame?
Hugo Mildenberger
Hugo.Mildenberger at web.de
Sat Feb 5 01:13:17 CET 2011
Danny,
sounds like you already have a certain idea how a 'nugget' distribution could
look like. Maybe you also could intentionally produce some experimental
data having such distributions, harvest the related patterns from the
microarray and then apply a method as it was described in
http://www.cs.uwaterloo.ca/~shai/TALKS/NIPS07_prob_wkshp.pdf
But this is an uneducated guess only.
Best
Hugo
On Saturday 05 February 2011 00:21:01 DB1984 wrote:
>
> Greg, Dennis - thanks for your input, I really appreciate the feedback, as it
> is not easy to source.
>
> In terms of the data; I've described it as 20 columns, which is the smallest
> dataset, but this can run to 320 columns, so in some cases there is likely
> to be enough power to detect non-normality. That said, a better solution
> would be useful.
>
> As a first approximation, I looked at the mean/median ratio to indicate
> simple skew in the data - which suggested that most of the data was normally
> distributed. I took the 'nuggets' to be those with a mean/median ratio in
> the top or bottom 1% of the data. This was a small group - overall the data
> appears relatively normally distributed within rows.
>
> The aim is really to find those nuggets with significantly non-normal
> distributions. My hope was to be able to take the tails of the p-values for
> Shapiro-Wilk, or some similar test, and find these enriched with nuggets.
> This may not be an appropriately robust approach - but is there a better
> option?
>
> One idea was to sort the data in each row, and perform a linear regression.
> For normal distributions I am expecting the intercept to be close to the
> mean. Using the (intercept-mean) and p-values for the fit of the regression
> was again another way to filter out the nuggets in the dataset.
>
> If it helps, the nuggets I am expecting are either grouped 80% grouped
> around the mean with 20% forming a uni-directional tail, or an approximate
> bimodal distribution.
>
> As I'd imagine is obvious - I don't have an ideal solution to finding these
> nuggets, and so coming up with the R code to do so is harder still. If
> anybody has insight into this sort of problem, and can point me in the
> direction of further reading, that would be helpful. If there is a
> ready-made solution, even better!
>
> As I said, thanks for your time with this...
>
>
>
More information about the R-help
mailing list