[R] Rank and extract data from a series
James Brown
jdb33 at hermes.cam.ac.uk
Tue Sep 23 13:50:43 CEST 2003
I would like to rank a time-series of data, extract the top ten data items
from this series, determine the corresponding row numbers for each value
in the sample, and take a mean of these *row numbers* (not the data).
I would like to do this in R, rather than pre-process the data on the
UNIX command line if possible, as I need to calculate other statistics
for the series.
I understand that I can use 'sort' to order the data, but I am not aware
of a function in R that would allow me to extract a given number of these
data and then determine their positions within the original time series.
e.g.
Time series:
1.0 (row 1)
4.5 (row 2)
2.3 (row 3)
1.0 (row 4)
7.3 (row 5)
Sort would give me:
1.0
1.0
2.3
4.5
7.3
I would then like to extract the top two data items:
4.5
7.3
and determine their positions within the original (unsorted) time series:
4.5 = row 2
7.3 = row 5
then take a mean:
2 and 5 = 3.5
Thanks in advance.
James Brown
___________________________________________
James Brown
Cambridge Coastal Research Unit (CCRU)
Department of Geography
University of Cambridge
Downing Place
Cambridge
CB2 3EN, UK
Telephone: +44 (0)1223 339776
Mobile: 07929 817546
Fax: +44 (0)1223 355674
E-mail: jdb33 at cam.ac.uk
E-mail: james_510 at hotmail.com
http://www.geog.cam.ac.uk/ccru/CCRU.html
___________________________________________
On Wed, 10 Sep 2003, Jerome Asselin wrote:
> On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> >
> > Your method looks like a naive reimplementation of integration, and
> > won't work so well for distributions that have the great majority of the
> > probability mass concentrated in a small fraction of the sample space.
> > I was hoping for something that would retain the adaptability of
> > integrate().
>
> Yesterday, I've suggested to use approxfun(). Did you consider my
> suggestion? Below is an example.
>
> N <- 500
> x <- rexp(N)
> y <- rank(x)/(N+1)
> empCDF <- approxfun(x,y)
> xvals <- seq(0,4,.01)
> plot(xvals,empCDF(xvals),type="l",
> xlab="Quantile",ylab="Cumulative Distribution Function")
> lines(xvals,pexp(xvals),lty=2)
> legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
>
>
> It's possible to tune in some parameters in approxfun() to better match
> your personal preferences. Have a look at help(approxfun) for details.
>
> HTH,
> Jerome Asselin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
More information about the R-help
mailing list