[R] how to check if a variable is preferentially present in a sample

Tania Oh tania.oh at bnc.ox.ac.uk
Tue Apr 8 17:24:25 CEST 2008


Dear All,

I do apologise if this question is out of place for this list but I've  
tried searching mailing lists and read "Introductory Statistics with  
R" by Peter Dalgaard, but couldn't find any hints on solving my  
question below:

I have a data frame (d) of values which I will rank in decreasing  
order of "val". Each value belongs to a group, either 'A', 'B', 'C',  
'D', or 'E'.  I then take the first 10 entries in data frame 'd'  and  
count the number of occurrences for each of the groups.  I want to  
test if certain groups occur more frequently than by chance in my  
first 10 entries. Would a chi-square test or a hypergeometric test be  
more suitable? If neither, what would be an alternative solution in  
R?  Below is my data:


## data
L5 <- LETTERS[1:5]
d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100,  
repl=TRUE)))

str(d)
##'data.frame':	100 obs. of  2 variables:
##$ val  : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5 6 1 2  
7 8 4 9 ...
##$ group: Factor w/ 5 levels "A","B","C","D",..: 4 4 4 5 3 1 5 2 1  
2 ...


Many thanks in advance and apologies again,
tania

D. phil student
Department of Physiology, Anatomy and Genetics
University of Oxford



More information about the R-help mailing list