[R] Sample based on Factor Selection Criteria
Josip Dasovic
jjd9 at sfu.ca
Mon Jun 1 23:00:48 CEST 2009
Dear R-users:
Hello all:
I'm having difficulty creating a new data frame, which would be a subset of an existing data frame, creaed by the random selection of subsets of observations based on different values of variables within the data frame.
Here's an example of what my data frame looks like:
fact x1 x2 x3 select...
blue 23 2.2 1.1 1
blue 28 4.2 0.8 0
blue 34 2.8 0.9 0
...
red 43 6.2 1.4 0
red 33 5.2 1.5 1
red 35 4.2 1.6 1
...
green 22 3.5 1.1 0
green 21 4.5 1.3 0
green 33 6.5 1.7 0
green 12 4.4 1.9 0
...
There hundreds of different values (i.e., "colours") of the variable "fact" within my dataset, each of which has dozens of observations (that is, there are about 50 observations with the "fact" value blue, 45 with red, 87 with magenta, etc.).
I would like to end up with a new data frame, which is a subset of my original data frame. The new (subsetted) data frame would have the following characteristics:
1) It would retain all of the observations for which "select"==1
2) It would retain a random sample of the observations for which "select"==0, such that there is one randomly sampled observation within each set of observations for which "fact" is the same value, and whose "select" value==1.
Thus, in the above example, I would like to retain
i) the first "blue" observation, and one additional randomly-selected "blue" observation for which select==0,
ii) the 2nd and 3rd "red" observations, and two more randomly-selected "red" observations for which "select"==0,
iii) none of the "green" observations, since none of these has a "select" value of 1.
So, the new data set would look something like this:
fact x1 x2 x3 select
blue 23 2.2 1.1 1
blue 34 2.8 0.9 0
red 43 6.2 1.4 0
red 33 5.2 1.5 1
red 35 4.2 1.6 1
red 28 4.4 1.4 0
Thank you for your help,
Josip
Josip Dasovic
Research Associate
Human Security Report Project
School of International Studies
Suite 7200
Simon Fraser University
515 West Hastings Street
Vancouver , BC
CANADA
V6B 5K3
More information about the R-help
mailing list