[R] Group averages
David Kling
klingd at reed.edu
Mon Jun 12 23:19:39 CEST 2006
Hello:
I hope none of you will mind helping a newbie. I'm a student research
assistant working with a large data set in which observations are
categorized according to two factors. I'm trying to calculate the group
mean and variance of a variable (called 'hsgpa' in the example data
presented below) to each observation , excluding that observation. For
example, if there are 20 observations with the same value of the two
factors, for each of the 20 I'd like to generate the mean and variance
of the 'hsgpa' values of the other 19 group members. This must be done
for every observation in the data set.
I've searched the R mail archives, read the manuals, and read
documentation for tapply() andby() as well as summaryBy() in the 'doBy'
package and with() from 'Hmisc.' It may be that since I'm new to
writing functions and R is the first language I've ever worked with I'm
less able to come up with a solution than some other new R users. None
of the functions I have tried have been succesful, and it doesn't seem
worth it to reproduce and explain my best effort. I hope someone has
some ideas! Looking at what an experienced user would try should help
me with my present task as well as future problems.
Below I've included some lines that will generate a sample data set
similar to the one I'm working with:
#
#Example data:
#
case <- sample(seq(1,10000,1),5000,replace=FALSE)
hsgpa <- rbeta(5000,7,1.5)*4.25
yr <- sample(seq(1993,2005,1),5000,replace=TRUE)
conf <- sample(letters[1:5],5000,replace=TRUE)
data <- data.frame(case=case,hsgpa=hsgpa,yr=yr,conf=conf)
data$conf <- as.character(data$conf)
s1 <- sample(seq(1,5000,1),500,replace=FALSE)
k <- data$hsgpa
k[row.names(data) %in% s1] <- NA
data$hsgpa <- k
s2 <- sample(seq(1,5000,1),100,replace=FALSE)
k <- data$yr
k[row.names(data) %in% s2] <- NA
data$yr <- k
k <- data$conf
k[row.names(data) %in% s2] <- NA
data$conf <- k
remove(case,hsgpa,yr,conf,s1,s2,k)
#
More information about the R-help
mailing list