[R] populating matrix with binary variable after matching data from data frame

Thu Aug 14 17:15:04 CEST 2014

Hi Bill,
sorry for trouble. It did not work both solutions.
Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds

my x matrix is may not have  items that x1 has.

say x only has A,B, C, D  , whereas x1 has K, L, M , A and D.  However
x1 does not have any relationship between B and C thus B-C will be a
zero anyway.

x1 :

K   L
D  A
K  M
M  A
Although M associates with A, since M is not present in X - we will
not map this association with 1. Since A and D are present in X - we
will assign 1.

   A  B  C  D

A 0  0  0  0

B 0  0  0  0

C 0  0  0  0

D  1 0  0  0

I tried this simple for loop but I get same subset error:

for(k in nrow(x1)){
x[x1[k,]$V1,x1[k,]$V2] <- 1
x[x1[,k]$V1,x1[,k]$V2] <- 1
x[x1[,k]$V2,x1[,k]$V1] <- 1
}

Error in `[<-`(`*tmp*`, hprd[x, ]$V1, hprd[x, ]$V2, value = 1) :
  subscript out of bounds

Thanks again.

On Wed, Aug 13, 2014 at 6:02 PM, William Dunlap <wdunlap at tibco.com> wrote:
> Another solution is to use table to generate your x matrix, instead of
> trying to make one and adding to it.  If you want the table to have
> the same dimnames on both sides, make factors out of the columns of x1
> with the same factor levels in both.  E.g., using a *small* example:
>
>> X1 <- data.frame(V1=c("A","A","B"), V2=c("C","C","A"))
>> X <- table(lapply(X1, factor, levels=union(levels(X1[[1]]), levels(X1[[2]]))))
>> X
>    V2
> V1  A B C
>   A 0 0 2
>   B 1 0 0
>   C 0 0 0
>
> If you don't want counts, but just a TRUE for presence and FALSE for
> absence, use X>0.  If you want 1 for presence and 0 for absence you
> can use pmin(X, 1).
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> I may have missed something, but I didn't see the result you want for
>> your example.  Also,
>> none of the entries in the x1 you showed are row or column names in x,
>> making it hard to show what you want to happen.
>>
>> Here is a function that gives you the choice of
>>     *error: stop if any row of x1 is 'bad'
>>     *omitRows: ignore rows of x1 are 'bad'
>>     *expandX: expand the x matrix to include all rows or columns named in x1
>> (Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
>> is not a column name of x).
>>
>> f
>> function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
>> {
>>     badEntryAction <- match.arg(badEntryAction)
>>     i <- as.matrix(x1[, c("V1", "V2")])
>>     if (badEntryAction == "omitRows") {
>>         i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
>>             2], dimnames(x)[[2]]), , drop = FALSE]
>>     }
>>     else if (badEntryAction == "expandX") {
>>         extraDimnames <- lapply(1:2, function(k) setdiff(i[,
>>             k], dimnames(x)[[k]]))
>>         # if you want the same dimnames on both axes, take union of
>> the 2 extraDimnames
>>         if ((n <- length(extraDimnames[[1]])) > 0) {
>>             x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
>> list(extraDimnames[[1]],
>>                 NULL)))
>>         }
>>         if ((n <- length(extraDimnames[[2]])) > 0) {
>>             x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
>>                 extraDimnames[[2]])))
>>         }
>>     }
>>     x[i] <- 1
>>     x
>> }
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
>> <oriolebaltimore at gmail.com> wrote:
>>> Hello again. sorry for question again.
>>>
>>> may be I was not clear in asking before.
>>>
>>>  I don't want to remove rows from matrix, since row names and column
>>> names are identical in matrix.
>>>
>>>
>>> I tried your suggestion and here is what I get:
>>>
>>>> fx <- function(x,x1){
>>> + i <- as.matrix(x1[,c("V1","V2")])
>>> + x[i]<-1
>>> + x
>>> + }
>>>> fx(x, x1)
>>>
>>> Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
>>>
>>>
>>>
>>>
>>>> x[1:4,1:4]
>>>        ABCA10 ABCA12 ABCA13 ABCA4
>>> ABCA10      0      0      0     0
>>> ABCA12      0      0      0     0
>>> ABCA13      0      0      0     0
>>> ABCA4       0      0      0     0
>>>
>>>
>>>> x1[1:10,]
>>>       V1       V2
>>> 1   AKT3    TCL1A
>>> 2  AKTIP    VPS41
>>> 3  AKTIP    PDPK1
>>> 4  AKTIP   GTF3C1
>>> 5  AKTIP    HOOK2
>>> 6  AKTIP    POLA2
>>> 7  AKTIP KIAA1377
>>> 8  AKTIP FAM160A2
>>> 9  AKTIP    VPS16
>>> 10 AKTIP    VPS18
>>>
>>>
>>> For instance, now I will loop over x1, I go to first row, I get V1 and
>>> check if if I have a row in x that have item in V1 and then check V2
>>> exist in colnames, if match then I assign 1. If not I go to row 2.
>>>
>>> In some rows, it is possible that I will only see element in V2 that
>>> exist in row names  and since element in V1 does not exist in X
>>> matrix, I will give 0. (since matrix X has identical row and column
>>> names, i feel it does not matter to check an element in column names
>>> after we check in row names)
>>>
>>>
>>>
>>> now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
>>> x1$V2 then in matrix X column 1 and row 1  should get 1.
>>>
>>> dput - follows..
>>>
>>> x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
>>> 4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
>>> ), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))
>>>
>>>
>>> x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
>>> "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
>>> "VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
>>> "VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
>>> 10L), class = "data.frame")
>>>
>>>
>>>
>>> Thanks for your time.
>>>
>>>
>>>
>>>
>>> On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>>> You can replace the loop
>>>>> for (i in nrow(x1)) {
>>>>>    x[x1$V1[i], x1$V2[i]] <- 1;
>>>>> }
>>>> by
>>>> f <- function(x, x1) {
>>>>   i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
>>>>   x[ i ] <- 1
>>>>   x
>>>> }
>>>> f(x, x1)
>>>>
>>>> You will get an error if not all the strings in the subscript matrix
>>>> are in the row or
>>>> column names of x.  What do you want to happen in this case.  You can choose
>>>> to first omit the bad rows in the subscript matrix
>>>>     goodRows <- is.element(i[,1], dimnames(x)[1]) &  is.element(i[,2],
>>>> dimnames(x)[2])
>>>>     i <- i[goodRows, , drop=FALSE]
>>>>     x[ i ] <- 1
>>>> or you can choose to expand x to include all the names found in x1.
>>>>
>>>> It would be good if you included some toy data to better illustrate
>>>> what you want to do.
>>>> E.g., with
>>>>   x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
>>>>   x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
>>>> the above f() gives
>>>>> f(x, x1)
>>>>     Col
>>>> Row  C1 C2 C3
>>>>   R1  0  1  0
>>>>   R2  0  0  0
>>>>   R3  1  0  0
>>>> Is that what you are looking for?