[R] proportional sampling

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Fri Apr 27 15:07:39 CEST 2007


On 27-Apr-07 12:15:29, tibi.codilean at ges.gla.ac.uk wrote:
> Dear All,
> 
> I wonder if you could help me.
> 
> I have a table with a series of sites that I would like to
> sample from.
> The table has 5 columns:
> 
> ID
> X Coordinate
> Y Coordinate
> Value
> Factor
> 
> 
> The conditions are that each site can be selected more than
> once and the probability of it being selected (or sampled)
> is proportional to a factor located in column 'Factor'
> 
> I am novice in terms of R, and am not entirely sure how to
> do the proportional sampling.
> 
> Any help would be appreciated
> Thanks
> Tibi

Since you want each site to be able to appear more than once
in the sample, there should be no problems in using sample():

  ID.sample <- sample(ID, N, replace=TRUE, prob=Factor)

where N is the sample size you want. (You do not need to make
Factor sum to 1: sample() looks after that).

Or, if you want an index which you can use to identify whole
rows (especially if, e.g., values of ID are repeated in the
table):

  ix <- sample((1:R), N, replace=TRUE, prob=Factor)

where R is the number of rows in the table. Then your sample
is the subset

  Table[ix,]

of rows of the table (where "Table" stands for the name of your table).

There are more complicated issues which can arise if you are
sampling without replacement with probability proportional to
some variable. Have a look at the packages 'pps' and 'sampfling'
for an indication of methods.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Apr-07                                       Time: 14:07:03
------------------------------ XFMail ------------------------------



More information about the R-help mailing list