John Kane
jrkrideau at yahoo.ca
Sun Apr 16 18:48:50 CEST 2006
This was immensely helpful !
I had tried "aggregate() ", messed it up and decided that I must have misunderstood its use vs by() and so posted my question to you. Instead I must have had a typo in there somewhere ..... I went back, did it again and it is lovely. I was going to have to 'slice and dice' the dataset with a myriad of subset calls otherwise.
Thank you very much.
I hope the weather is as nice in Hamilton as it is in Kingston. And now I can go get some sun!
Dear John,
You can use aggregate(), also described in my suggestion to Sirinivas:
> aggregate(Data[, 4:6], Data[1:3], sum)
Prog Sub.Program Job V1 V2 V3
1 1 Alpha A 3 4 5
2 2 Alpha A 3 4 1
3 2 Alpha B 2 3 1
4 2 Gamma B 3 5 6
I hope this helps,
John
> Dear Dr. Fox
> Your reply to Sirinivas Iyyar was most helpful to me. I am
> trying to collapse some categories of a data.frame in a similar way.
> I have a data frame in the form below
>
> Prog Sub.Program Job V1 V2 V3
> 1 Alpha A 1 2 3
> 2 Alpha B 2 3 1
> 2 Gamma B 1 3 3
> 2 Alpha A 3 4 1
> 2 Gamma B 2 2 3
> 1 Alpha A 2 2 2
>
> What I want is to sum the values of VI, V2 and V3 and end up
> with a new data.frame that would look like
>
> Prog Subprog Job Sum(V1) Sum(V2), Sum(V3)
> 1 Alpha A 3 4 5
> 2 Alpha A 3 4 1
> 2 Gamma B 3 5 6
>
> I thought that I could use by() to create a vector for each
> of V1:V3 but I cannot see any way to capture the values.
> temp1 <- by(Data[,4] simply gives me the complete output.
>
> An example of what I have done is
> -------------------------------------------------------------
>
> Prog <- 1, 2, 2, 2,2,1,
> Sub.Program <- c("Alpha", "Alpha", "Gamma", "Alpha",
> "Gamma", "Alpha" )
> Job <- c("A", "B", "B", "A", "B", "A")
> V1 <- c(1,2, 1,3,2,2)
> V2 <- c(2, 3, 3, 4, 2, 2)
> V3 <- c(3, 1 , 3, 1, 3,2
> Mydata <- data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3)
>
> by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum)
> ----------------------------------------------------------------
>
> I also get the expected <NA. for cells that do not exist. Is
> there any way to set them to "0" in the operation?
>
>
>
> Any help would be greatly appreciated.
> Thanks
> John
>
> Dear Srinivas,
>
> Your data are likely in a data frame rather than a matrix
> (since the columns are heterogeneous), and name is a
> variable, not the row names of the data frame.
>
> There are several ways to do what you want; one simple way,
> assuming that the data are in a data frame named Data, is
>
> by(Data[,2:5], Data$name, mean)
>
> If you want the result in the form of a matrix, then you could do
>
> aggregate(Data[,2:5], list(Data$name), mean)
>
> I hope this helps,
> John
>
> > dear group,
> >
> > i have a sample matrix
> > name v1 v2 v3 v4
> > cat 10 11 12 15
> > dog 3 12 10 14
> > cat 9 12 12 15
> > cat 5 12 10 11
> > dog 12 113 123 31
> > ...
> >
> >
> > since cat is repeated 3 times, I want a mean value for it.
> > Like wise for every element of the name column.
> > cat v1 = mean(c(10,9,5))
> > cat v3 = mean(c(11,12,13))
> > ..etc.
> >
> > name v1 v2 v3 v4
> > cat 8 11.6 11.3 13.6
> > dog 7.5 62.5 66.5 22.5
> >
> > could any one help me in solving this mystery. thank you.
> >
