[R] Conditional mean for groups, new variables

arun smartpink111 at yahoo.com
Tue Jun 3 03:45:25 CEST 2014

```Hi,
If you want to extract only particular variables, check ?subset, ?Extract.
Using my first example:
aggregate(MATH~SCHOOLID,rev1, mean)[,-1,drop=FALSE]
#      MATH
#1 14.50000
#2 17.20000
#3 13.71429
#4 13.83333
# more than one variable
res1 <-

aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE) ##Column1 is "SCHOOLID"
res1New <- res1[,-1]
res1New
#      MATH      AGE   STO2Q01      BFMJ     BMMJ
#1 14.50000 10.50000 15.500000  8.000000 14.00000
#2 17.20000  7.60000 10.200000 18.600000 12.80000
#3 13.71429 17.28571  9.142857  9.857143 17.85714
#4 13.83333 15.33333 13.666667 11.666667 11.00000
#or
res1[!grepl("SCHOOLID", colnames(res1))]
A.K.

I tried to explain all the things that I want to do in this picture :) Sorry, if it's not so understandable, but I tried :)

On Monday, June 2, 2014 4:02 AM, arun <smartpink111 at yahoo.com> wrote:

Hi,
Regarding your first comment, you didn't provide any reproducible example. So I created one with SCHOOLID's as alphabets.  According to your original post, you had a read dataset with 36000 SCHOOLIDs.  Suppose, if I created the SCHOOLIDs using:
length(outer(LETTERS,1:2000,paste,sep=""))
#[1] 52000

#Please note that I am creating only 6 columns as an example
set.seed(42)
rev1 <- data.frame(SCHOOLID = sample(outer(LETTERS,1:1000,paste,sep=""),36e3, replace=TRUE), matrix(sample(180, 36e3*5,replace=TRUE), ncol=5, dimnames=list(NULL, c("MATH", "AGE", "STO2Q01", "BFMJ", "BMMJ"))),stringsAsFactors=FALSE)
dim(rev1)
#[1] 36000     6

res1 <- aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
dim(res1)
#[1] 26010     6
# SCHOOLID  MATH AGE STO2Q01 BFMJ BMMJ
#1       A1 107.5  30    41.5   75  149
#2     A100 159.5 132   107.0   66   15
colMeans(rev1[rev1\$SCHOOLID=="A1",-1])
#   MATH     AGE STO2Q01    BFMJ    BMMJ
#  107.5    30.0    41.5    75.0   149.0

#I am not following the second statement.  Please provide a reproducible example using ?dput().
May be you want results in this form:

rev2 <- data.frame(SCHOOLID=rev1[,1], sapply(rev1[-1],function(x) ave(x, rev1[,1], FUN= mean, na.rm=TRUE)))

A.K.

I'm sorry, but it does not :(
It gives results maximum only for first 26 schools (according to the number of letters in the alphabet). And according to the result it counts not an avreage values of the factors.

On Sunday, June 1, 2014 8:37 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
May be this helps:

set.seed(42)
rev1 <- data.frame(SCHOOLID=sample(LETTERS[1:4],20,replace=TRUE), matrix(sample(25, 20*5,replace=TRUE), ncol=5, dimnames=list(NULL, c("MATH", "AGE", "STO2Q01", "BFMJ", "BMMJ"))),stringsAsFactors=FALSE)
res1 <- aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
res1
#if you need to change the names
res2 <- setNames(aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE), c("SCHOOLID", paste(colnames(rev1)[-1], "MEAN",sep="_")))
res2

A.K.

Hello! I have a problem, I want to calculate conditional mean for my dataset. First, I attach it: