[R] Sampling a dataframe based on the length of a subset of observations within
    Eric Vander Wal 
    eric.vanderwal at usask.ca
       
    Thu Jul  9 03:19:23 CEST 2009
    
    
  
Thank you in advance for your consideration.
I have a dataframe of 2000+ observations with repeated measures across 
approximately 300 unique individuals  An event either does or does not 
happen
(1,0) and there is a suit of independent variables associated with the 
event.  A simplified representation follows:
my.df<-data.frame("id"=c("A","A","A","B","B","C","C","C", "C", "C"), 
event=c(0,0,1,0,1,0,0,1,1, 0))
_id_  _event_
A     0
A     0
A     1
B     0
B     1
C     0
C     0
C     1
C     1
C     0
I need to sample my.df to select the same number of observations with 
event = 0 as event = 1 for each unique id.
I can reshape or tapply my.df to group id and determine what sample size 
I need.  my.df.cast=
library(reshape)
my.df.melt<-melt(my.df, id="id")
my.df.cast<-cast(my.df.melt, id~value, length, fill=0)
my.df.cast
       Event
_id_      _0_   _1_
A     2     *1*
B     1     *1*
C     3     *2*
Given the above dataframe I need to randomly select (sample) from my.df 
*one* observation from my.df[my.df$id==A & my.df$event==0],  *one* from 
my.df[my.df$id==B & my.df$event==0], and* two* from my.df[my.df$id==C & 
my.df$event==0] and then rbind them to my.df[my.df$event == 1].  
However, it is impractical to individually code each case.
Alternatively if A in my.df matches A in my.df.cast  then 
sample(my.df[my.df$id == A & my.df$event == 0], size=my.df.cast[1,3], 
replace=FALSE).  I think I am close to a solution but I'm not sure how 
to code it to run through the entire dataframe.
This is how my.new.df would look:
_id event_
A     0
A     1
B     0
B     1
C     0
C     0
C     1
C     1
Thank you kindly for your help,
Eric
-- 
Eric Vander Wal
Ph.D. Candidate
University of Saskatchewan, 
Department of Biology,
112 Science Place, 
Saskatoon, SK., S7N 5E2
"Pluralitas non est ponenda sine neccesitate"
    
    
More information about the R-help
mailing list