[R] populating matrix with binary variable after matching data from data frame

William Dunlap wdunlap at tibco.com
Thu Aug 14 00:02:10 CEST 2014


Another solution is to use table to generate your x matrix, instead of
trying to make one and adding to it.  If you want the table to have
the same dimnames on both sides, make factors out of the columns of x1
with the same factor levels in both.  E.g., using a *small* example:

> X1 <- data.frame(V1=c("A","A","B"), V2=c("C","C","A"))
> X <- table(lapply(X1, factor, levels=union(levels(X1[[1]]), levels(X1[[2]]))))
> X
   V2
V1  A B C
  A 0 0 2
  B 1 0 0
  C 0 0 0

If you don't want counts, but just a TRUE for presence and FALSE for
absence, use X>0.  If you want 1 for presence and 0 for absence you
can use pmin(X, 1).

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
> I may have missed something, but I didn't see the result you want for
> your example.  Also,
> none of the entries in the x1 you showed are row or column names in x,
> making it hard to show what you want to happen.
>
> Here is a function that gives you the choice of
>     *error: stop if any row of x1 is 'bad'
>     *omitRows: ignore rows of x1 are 'bad'
>     *expandX: expand the x matrix to include all rows or columns named in x1
> (Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
> is not a column name of x).
>
> f
> function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
> {
>     badEntryAction <- match.arg(badEntryAction)
>     i <- as.matrix(x1[, c("V1", "V2")])
>     if (badEntryAction == "omitRows") {
>         i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
>             2], dimnames(x)[[2]]), , drop = FALSE]
>     }
>     else if (badEntryAction == "expandX") {
>         extraDimnames <- lapply(1:2, function(k) setdiff(i[,
>             k], dimnames(x)[[k]]))
>         # if you want the same dimnames on both axes, take union of
> the 2 extraDimnames
>         if ((n <- length(extraDimnames[[1]])) > 0) {
>             x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
> list(extraDimnames[[1]],
>                 NULL)))
>         }
>         if ((n <- length(extraDimnames[[2]])) > 0) {
>             x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
>                 extraDimnames[[2]])))
>         }
>     }
>     x[i] <- 1
>     x
> }
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
> <oriolebaltimore at gmail.com> wrote:
>> Hello again. sorry for question again.
>>
>> may be I was not clear in asking before.
>>
>>  I don't want to remove rows from matrix, since row names and column
>> names are identical in matrix.
>>
>>
>> I tried your suggestion and here is what I get:
>>
>>> fx <- function(x,x1){
>> + i <- as.matrix(x1[,c("V1","V2")])
>> + x[i]<-1
>> + x
>> + }
>>> fx(x, x1)
>>
>> Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
>>
>>
>>
>>
>>> x[1:4,1:4]
>>        ABCA10 ABCA12 ABCA13 ABCA4
>> ABCA10      0      0      0     0
>> ABCA12      0      0      0     0
>> ABCA13      0      0      0     0
>> ABCA4       0      0      0     0
>>
>>
>>> x1[1:10,]
>>       V1       V2
>> 1   AKT3    TCL1A
>> 2  AKTIP    VPS41
>> 3  AKTIP    PDPK1
>> 4  AKTIP   GTF3C1
>> 5  AKTIP    HOOK2
>> 6  AKTIP    POLA2
>> 7  AKTIP KIAA1377
>> 8  AKTIP FAM160A2
>> 9  AKTIP    VPS16
>> 10 AKTIP    VPS18
>>
>>
>> For instance, now I will loop over x1, I go to first row, I get V1 and
>> check if if I have a row in x that have item in V1 and then check V2
>> exist in colnames, if match then I assign 1. If not I go to row 2.
>>
>> In some rows, it is possible that I will only see element in V2 that
>> exist in row names  and since element in V1 does not exist in X
>> matrix, I will give 0. (since matrix X has identical row and column
>> names, i feel it does not matter to check an element in column names
>> after we check in row names)
>>
>>
>>
>> now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
>> x1$V2 then in matrix X column 1 and row 1  should get 1.
>>
>> dput - follows..
>>
>> x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
>> 4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
>> ), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))
>>
>>
>> x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
>> "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
>> "VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
>> "VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
>> 10L), class = "data.frame")
>>
>>
>>
>> Thanks for your time.
>>
>>
>>
>>
>> On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>> You can replace the loop
>>>> for (i in nrow(x1)) {
>>>>    x[x1$V1[i], x1$V2[i]] <- 1;
>>>> }
>>> by
>>> f <- function(x, x1) {
>>>   i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
>>>   x[ i ] <- 1
>>>   x
>>> }
>>> f(x, x1)
>>>
>>> You will get an error if not all the strings in the subscript matrix
>>> are in the row or
>>> column names of x.  What do you want to happen in this case.  You can choose
>>> to first omit the bad rows in the subscript matrix
>>>     goodRows <- is.element(i[,1], dimnames(x)[1]) &  is.element(i[,2],
>>> dimnames(x)[2])
>>>     i <- i[goodRows, , drop=FALSE]
>>>     x[ i ] <- 1
>>> or you can choose to expand x to include all the names found in x1.
>>>
>>> It would be good if you included some toy data to better illustrate
>>> what you want to do.
>>> E.g., with
>>>   x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
>>>   x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
>>> the above f() gives
>>>> f(x, x1)
>>>     Col
>>> Row  C1 C2 C3
>>>   R1  0  1  0
>>>   R2  0  0  0
>>>   R3  1  0  0
>>> Is that what you are looking for?



More information about the R-help mailing list