[R] matching identical row names
John Kane
jrkrideau at yahoo.ca
Sun Apr 16 17:29:03 CEST 2006
Dear Dr. Fox
Your reply to Sirinivas Iyyar was most helpful to me. I am trying to collapse some categories of a data.frame in a similar way.
I have a data frame in the form below
Prog Sub.Program Job V1 V2 V3
1 Alpha A 1 2 3
2 Alpha B 2 3 1
2 Gamma B 1 3 3
2 Alpha A 3 4 1
2 Gamma B 2 2 3
1 Alpha A 2 2 2
What I want is to sum the values of VI, V2 and V3 and end up with a new data.frame that would look like
Prog Subprog Job Sum(V1) Sum(V2), Sum(V3)
1 Alpha A 3 4 5
2 Alpha A 3 4 1
2 Gamma B 3 5 6
I thought that I could use by() to create a vector for each of V1:V3 but I cannot see any way to capture the values.
temp1 <- by(Data[,4] simply gives me the complete output.
An example of what I have done is
-------------------------------------------------------------
Prog <- 1, 2, 2, 2,2,1,
Sub.Program <- c("Alpha", "Alpha", "Gamma", "Alpha", "Gamma", "Alpha" )
Job <- c("A", "B", "B", "A", "B", "A")
V1 <- c(1,2, 1,3,2,2)
V2 <- c(2, 3, 3, 4, 2, 2)
V3 <- c(3, 1 , 3, 1, 3,2
Mydata <- data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3)
by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum)
----------------------------------------------------------------
I also get the expected <NA. for cells that do not exist. Is there any way to set them to "0" in the operation?
Any help would be greatly appreciated.
Thanks
John
Dear Srinivas,
Your data are likely in a data frame rather than a matrix (since the columns
are heterogeneous), and name is a variable, not the row names of the data
frame.
There are several ways to do what you want; one simple way, assuming that
the data are in a data frame named Data, is
by(Data[,2:5], Data$name, mean)
If you want the result in the form of a matrix, then you could do
aggregate(Data[,2:5], list(Data$name), mean)
I hope this helps,
John
>
> dear group,
>
> i have a sample matrix
> name v1 v2 v3 v4
> cat 10 11 12 15
> dog 3 12 10 14
> cat 9 12 12 15
> cat 5 12 10 11
> dog 12 113 123 31
> ...
>
>
> since cat is repeated 3 times, I want a mean value for it.
> Like wise for every element of the name column.
> cat v1 = mean(c(10,9,5))
> cat v3 = mean(c(11,12,13))
> ..etc.
>
> name v1 v2 v3 v4
> cat 8 11.6 11.3 13.6
> dog 7.5 62.5 66.5 22.5
>
> could any one help me in solving this mystery. thank you.
>
