[R] confidence interval for negatively skewed, leptokurtic sample

Mon Feb 8 17:09:03 CET 2010

Hello,

I´ve got a statistical problem that I hope you can help me with. It doesn´t
have to do directly with R, so if there´s another forum which would suit
better, please tell me! 

Now here´s the problem:

I want to derive confidence intervals for a variable X, which is - given the
descriptive statistics - obviously negatively skewed and leptokurtic (i.e.
peaked). My aim is to make a statement similar to this one: Given certain
values of the two explaining variables (see below), X will range from value
A to value B when drawing again from the same parent population.

The dataset that I´m using is pretty huge, it contains some 150,000 cases
and it can be seen as a sample out of the basic population which covers an
entire year (the sample).
The variable I´m interested in is the prediction error X of a given (daily
computed) wind power forecast which I compute as a difference of the
prediction value and the respective realisation value. To make things
clearer: the predictions yielding my data are calculated once a day, and
they cover three days so that there are three prediction values for each
realisation value.

Unfortunately, there is autocorrelation in the dataset because the there is
data for every quarter of an hour. That´s why I have to select some cases at
random (at least I think so). Second, and more important, I want to classify
the data in order to process the available information about a dependence of
X from the two explaining variables "prediction horizon" and "prediction
level", i.e. the level of the predited power output in relation to the
maximum power output, the latter also called nominal power or rated power.
That´s why the sample I want to analyse is reduced down to about 300 cases.

As the mean of X is unsurprisingly always close to zero, I want to gather
information about the dispersion of X as a function of the explaining
variables. A regression however doesn´t seem appropriate to me because the
resulting confidence intervals of X subject to the explaining variables
would blur a lot of information hidden in the dataset (i.e. a stronger
dispersion for daytime predictions). That´s why I thought a classification
would meet my needs best.

My first aim is now to get some information about the standard deviation or
the variance of the parent population of X. I thought about bootstrapping:
drawing various samples from the same basic population would enable me to
calculate a confidence interval for the parameter of interest, i.e. the
standard deviation. Do you think that´s a suitable approach? I´m currently
using PASW (former SPSS) which is obviously not a very powerful software,
but I have access to Stata computers, too.

Assuming that I receive a confidence interval for, say, the standard
deviation, then the next problem arises: the distribution of X is still
negatively skewed and leptokurtic, so how can I anyhow derive a confidence
interval for X? Summing and subtracting the standard deviation multiplied by
1.96 would result in a symmetric confidence interval which is probably
wrong.

It would be great if someone could help me with this. I´m not making any
progress at the moment...

Best,
Andreas
-- 
View this message in context: http://n4.nabble.com/confidence-interval-for-negatively-skewed-leptokurtic-sample-tp1473062p1473062.html
Sent from the R help mailing list archive at Nabble.com.