[R] matrix manipulations

Mon Jan 17 23:37:42 CET 2011

Hi,

I've got 2 very good solutions, thank you very much. One, from Henrique Dallazuanna using the library reshape and one line of code - although it will take me quite some time to understand it. Here it is what he sent:

library(reshape)
xtabs(rowSums(cbind(value.x, value.y), na.rm = TRUE) ~ X1 + X2, merge(melt(m1), melt(m2), by = c('X1', 'X2'), all = TRUE), exclude = FALSE)

The other is from Phil Spector ( code below) that i can understand quite easily, although until now to my shame i never quite used factor levels and their properties and i don't know their uses and possibilities. Until now i tried to avoid them and transform them in something else (like character strings).

Again, thanks for all your help,
Monica

----------------------------------------
> Date: Mon, 17 Jan 2011 12:13:09 -0800
> From: spector at stat.berkeley.edu
> To: pisicandru at hotmail.com
> CC: r-help at r-project.org
> Subject: Re: [R] matrix manipulations
>
> Monica -
> Perhaps this small example can demonstrate how factors can
> solve your problem:
>
> > d1 = data.frame(cat=sample(c('cat2','cat5','cat6'),100,replace=TRUE),group=sample(c('land','water'),100,replace=TRUE))
> > d2 = data.frame(cat=sample(c('cat1','cat3','cat4'),100,replace=TRUE),group=sample(c('land','water'),100,replace=TRUE))
> > d1$cat = factor(d1$cat,levels=c('cat1','cat2','cat3','cat4','cat5','cat6'))
> > d2$cat = factor(d2$cat,levels=c('cat1','cat2','cat3','cat4','cat5','cat6'))
> > table(d1$group,d1$cat) + table(d2$group,d2$cat)
>
> cat1 cat2 cat3 cat4 cat5 cat6
> land 14 17 18 22 19 23
> water 19 15 16 11 10 16
>
> This works because when you include all possible levels in a factor, R will
> automatically put zeroes in the right places when you use table():
>
> > table(d1$group,d1$cat)
> cat1 cat2 cat3 cat4 cat5 cat6
> land 0 17 0 0 19 23
> water 0 15 0 0 10 16
> > table(d2$group,d2$cat)
> cat1 cat2 cat3 cat4 cat5 cat6
> land 14 0 18 22 0 0
> water 19 0 16 11 0 0
>
> Hope this helps.
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spector at stat.berkeley.edu
>
>
>
> On Mon, 17 Jan 2011, Monica Pisica wrote:
>
> >
> > Hi,
> >
> > I am having some difficulties with matrix operations. It is a little hard to explain it so please bear with me. I have a very large data set, large enough that it needs to be split in parts in order to deal with. I can work things on these "parts" but the problem lies in adding together these parts for the final answer.
> >
> > So that been said, let's say that i split the data in 2 parts, 1 and 2. Each part has data belonging to 6 different categories, and each category has 2 different classes, these classes being the same for each category. The classes are called "land" and "water" and each category is labeled "cat1" to "cat6". I am using the command (function) table to tabulate each class for each category, but since i split the data in 2 parts, one part has only some of the 6 categories, and the other some other of the 6 categories (and not necessarily exclusive).
> >
> > So let's built some results after i used the table function.
> >
> > m1 <- matrix(c(32, 35, 36, 12, 15, 16), nrow = 2, ncol = 3, byrow = TRUE, dimnames = list(c("land", "water"), c("cat2", "cat5", "cat6")))
> >
> >> m1
> > cat2 cat5 cat6
> > land 32 35 36
> > water 12 15 16
> >
> > m2 <- matrix(c(45, 46, 47, 48, 21, 22, 23, 24), nrow = 2, ncol = 4, byrow = TRUE, dimnames = list(c("land", "water"), c("cat1", "cat2", "cat3", "cat4")))
> >
> >> m2
> > cat1 cat2 cat3 cat4
> > land 45 46 47 48
> > water 21 22 23 24
> >
> > So my end desired result should be a matrix (or a data frame) that has 6 columns called cat1 to cat6 and 2 rows labeled land and water, and for the category that appears in both m1 and m2 the end result will be a sum.
> >
> > results will be m3:
> >
> > cat1 cat2 cat3 cat4 cat5 cat6
> > land 45 78 47 48 35 36
> > water 21 34 23 24 15 16
> >
> > To do this i thought in making an empty matrix for each m1 and m2 (called m01 and m02 respectively) with 6 columns and 2 rows, and do a long if else statement in which i match the name of the first column in m1 with the name of the first column in m01 and if they match get the data from m1, if not leave it 0 and so on. Same thing for m2 and m02. This is long and extremely clunky but afterwards i can add m01 with m02 and get my desired result m3. Is there any way i can do this more elegantly? My real data is split in 4 parts, but the problem is the same.
> >
> > Thanks for all your inputs, and sorry for this long email, but i didn't know how else i could explain what i wanted to do.
> >
> > Monica
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >