# [R] integrating 2 lists and a data frame in R

David Winsemius dwinsemius at comcast.net
Tue Jun 6 16:44:34 CEST 2017

```> On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>
> Hi Bogdan,
> Kinda messy, but:
>
> N <- data.frame(N=c("n1","n2","n3","n4"))
> M <- data.frame(M=c("m1","m2","m3","m4","m5"))
> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), I=c(100,300,400))
> MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1])))
> names(MN)<-M[,1]
> rownames(MN)<-N[,1]
> C[,1]<-as.character(C[,1])
> C[,2]<-as.character(C[,2])
> for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3]

`xtabs` offers another route:

C\$m <- factor(C\$m, levels=M\$M)
C\$n <- factor(C\$n, levels=N\$N)

Option 1:  Zeroes in the empty positions:
> (X <- xtabs(I ~ m+n , C, addNA=TRUE))
n
m     n1  n2  n3  n4
m1 100 300   0   0
m2   0   0   0   0
m3   0   0 400   0
m4   0   0   0   0
m5   0   0   0   0

Option 2: Sparase matrix
> (X <- xtabs(I ~ m+n , C, sparse=TRUE))
5 x 4 sparse Matrix of class "dgCMatrix"
n
m     n1  n2  n3 n4
m1 100 300   .  .
m2   .   .   .  .
m3   .   . 400  .
m4   .   .   .  .
m5   .   .   .  .

I wasn't sure if the sparse reuslts of xtabs would make a distinction between 0 and NA, but happily it does:

> C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3", "m4", "m5"), I=c(100,300,400, NA, 0))
> C
n  m   I
1 n1 m1 100
2 n2 m1 300
3 n3 m3 400
4 n3 m4  NA
5 n4 m5   0
> (X <- xtabs(I ~ m+n , C, sparse=TRUE))
4 x 4 sparse Matrix of class "dgCMatrix"
n
m     n1  n2  n3 n4
m1 100 300   .  .
m3   .   . 400  .
m4   .   .   .  .
m5   .   .   .  0

(In the example I forgot to repeat the lines that augmented the factor levels so m2 is not seen.

--
Davod
>
>
> Jim
>
> On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:
>> Dear Bert,
>>
>> thank you for your response. here it is the piece of R code : given 3 data
>> frames below ---
>>
>> N <- data.frame(N=c("n1","n2","n3","n4"))
>>
>> M <- data.frame(M=c("m1","m2","m3","m4","m5"))
>>
>> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), I=c(100,300,400))
>>
>> how shall I integrate N, and M, and C in such a way that at the end we have
>> a data frame with :
>>
>>
>>   - list N as the columns names
>>   - list M as the rows names
>>   - the values in the cells of N * M, corresponding to the numerical
>>   values in the data frame C.
>>
>> more precisely, the result shall be :
>>
>>     n1  n2  n3 n4
>> m1  100  200   -   -
>> m2   -   -   -   -
>> m3   -   -   300   -
>> m4   -   -   -   -
>> m5   -   -   -   -
>>
>> thank you !
>>
>>
>> On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>
>>> Reproducible example, please. -- In particular, what exactly does C look
>>> ilike?
>>>
>>> (You should know this by now).
>>>
>>> -- Bert
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:
>>>> Dear all,
>>>>
>>>> please could you advise on the R code I could use in order to do the
>>>> following operation :
>>>>
>>>> a. -- I have 2 lists of "genome coordinates" : a list is composed by
>>>> numbers that represent genome coordinates;
>>>>
>>>> let's say list N :
>>>>
>>>> n1
>>>>
>>>> n2
>>>>
>>>> n3
>>>>
>>>> n4
>>>>
>>>> and a list M:
>>>>
>>>> m1
>>>>
>>>> m2
>>>>
>>>> m3
>>>>
>>>> m4
>>>>
>>>> m5
>>>>
>>>> 2 -- and a data frame C, where for some pairs of coordinates (n,m) from
>>> the
>>>> lists above, we have a numerical intensity;
>>>>
>>>> for example :
>>>>
>>>> n1; m1; 100
>>>>
>>>> n1; m2; 300
>>>>
>>>> The question would be : what is the most efficient R code I could use in
>>>> order to integrate the list N, the list M, and the data frame C, in order
>>>> to obtain a DATA FRAME,
>>>>
>>>> -- list N as the columns names
>>>> -- list M as the rows names
>>>> -- the values in the cells of N * M, corresponding to the numerical
>>> values
>>>> in the data frame C.
>>>>
>>>> A little example would be :
>>>>
>>>>      n1  n2  n3 n4
>>>>
>>>>      m1  100  -   -   -
>>>>
>>>>      m2  300  -   -   -
>>>>
>>>>      m3   -   -   -   -
>>>>
>>>>      m4   -   -   -   -
>>>>
>>>>      m5   -   -   -   -
>>>> I wrote a script in perl, although i would like to do this in R
>>>> Many thanks ;)
>>>> -- bogdan
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help