[R] Percentiles for unequal probability sample

David Winsemius dwinsemius at comcast.net
Wed Nov 20 23:56:34 CET 2013

```On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote:

> I often work with tree data that is sampled with probability proportional
> to size, which presents a special challenge when describing the frequency
> distribution.  For example, R functions like quantile() and fitdistr()
> expect each observation to have equal sample probability.  As a workaround,
> I have been "exploding"/"mushrooming" my data based on the appropriate
> expansion factors.  However, this can take a LONG TIME and I am reaching
> out for more efficient suggestions, particularly for the quantile()
> function.  Example of my workaround:
>

The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it might have been borrowed from the quantreg package.

> # trees.df represents random sample with probability proportional to size
> (of diameter) using "basal area factor" of 20
> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2),
> TreesPerAcre=numeric(10))
> trees.df\$TreesPerAcre <- 20/(trees.df\$Diameter^2*pi/576)    # expansion
> factor for each observation
>
> # to obtain percentiles that are weighted by trees per acre, "explode"
> diameter data
> explodeFactor <- 10 # represents ten acres
> treeCount <- sum(round(trees.df\$TreesPerAcre*explodeFactor ))
> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount))
> k=0 # initialize counter k
> for (i in 1:length(trees.df\$Diameter)){
>  for (j in 1:round(trees.df\$TreesPerAcre[i]*explodeFactor)){
>    k <- k +1
>    explodedDiameters.df\$Diameter[k] <- trees.df\$Diameter[i]
>   }
> }
>
> quantile(explodedDiameters.df\$Diameter) # appropriate percentiles (for
> trees per acre)
> quantile(trees.df\$Diameter)             # percentiles biased upwards
>
>
>
> Trevor Walker
>
--

David Winsemius
Alameda, CA, USA

```