[R] integrating 2 lists and a data frame in R

Bert Gunter bgunter.4567 at gmail.com
Tue Jun 6 17:19:05 CEST 2017


Simple matrix indexing suffices without any fancier functionality.

## First convert M and N to character vectors -- which they should
have been in the first place!

M <- sort(as.character(M[,1]))
N <-  sort(as.character(N[,1]))

## This could be a one-liner, but I'll split it up for clarity.

res <-matrix(NA, length(M),length(N),dimnames = list(M,N))

res[as.matrix(C[,2:1])] <- C$I ## matrix indexing

res

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jun 6, 2017 at 7:46 AM, Bogdan Tanasa <tanasa at gmail.com> wrote:
> Thank you David. Using xtabs operation simplifies the code very much, many
> thanks ;)
>
> On Tue, Jun 6, 2017 at 7:44 AM, David Winsemius <dwinsemius at comcast.net>
> wrote:
>
>>
>> > On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>> >
>> > Hi Bogdan,
>> > Kinda messy, but:
>> >
>> > N <- data.frame(N=c("n1","n2","n3","n4"))
>> > M <- data.frame(M=c("m1","m2","m3","m4","m5"))
>> > C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"),
>> I=c(100,300,400))
>> > MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1])))
>> > names(MN)<-M[,1]
>> > rownames(MN)<-N[,1]
>> > C[,1]<-as.character(C[,1])
>> > C[,2]<-as.character(C[,2])
>> > for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3]
>>
>> `xtabs` offers another route:
>>
>> C$m <- factor(C$m, levels=M$M)
>> C$n <- factor(C$n, levels=N$N)
>>
>> Option 1:  Zeroes in the empty positions:
>> > (X <- xtabs(I ~ m+n , C, addNA=TRUE))
>>     n
>> m     n1  n2  n3  n4
>>   m1 100 300   0   0
>>   m2   0   0   0   0
>>   m3   0   0 400   0
>>   m4   0   0   0   0
>>   m5   0   0   0   0
>>
>> Option 2: Sparase matrix
>> > (X <- xtabs(I ~ m+n , C, sparse=TRUE))
>> 5 x 4 sparse Matrix of class "dgCMatrix"
>>     n
>> m     n1  n2  n3 n4
>>   m1 100 300   .  .
>>   m2   .   .   .  .
>>   m3   .   . 400  .
>>   m4   .   .   .  .
>>   m5   .   .   .  .
>>
>> I wasn't sure if the sparse reuslts of xtabs would make a distinction
>> between 0 and NA, but happily it does:
>>
>> > C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3",
>> "m4", "m5"), I=c(100,300,400, NA, 0))
>> > C
>>    n  m   I
>> 1 n1 m1 100
>> 2 n2 m1 300
>> 3 n3 m3 400
>> 4 n3 m4  NA
>> 5 n4 m5   0
>> > (X <- xtabs(I ~ m+n , C, sparse=TRUE))
>> 4 x 4 sparse Matrix of class "dgCMatrix"
>>     n
>> m     n1  n2  n3 n4
>>   m1 100 300   .  .
>>   m3   .   . 400  .
>>   m4   .   .   .  .
>>   m5   .   .   .  0
>>
>> (In the example I forgot to repeat the lines that augmented the factor
>> levels so m2 is not seen.
>>
>> --
>> Davod
>> >
>> >
>> > Jim
>> >
>> > On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:
>> >> Dear Bert,
>> >>
>> >> thank you for your response. here it is the piece of R code : given 3
>> data
>> >> frames below ---
>> >>
>> >> N <- data.frame(N=c("n1","n2","n3","n4"))
>> >>
>> >> M <- data.frame(M=c("m1","m2","m3","m4","m5"))
>> >>
>> >> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"),
>> I=c(100,300,400))
>> >>
>> >> how shall I integrate N, and M, and C in such a way that at the end we
>> have
>> >> a data frame with :
>> >>
>> >>
>> >>   - list N as the columns names
>> >>   - list M as the rows names
>> >>   - the values in the cells of N * M, corresponding to the numerical
>> >>   values in the data frame C.
>> >>
>> >> more precisely, the result shall be :
>> >>
>> >>     n1  n2  n3 n4
>> >> m1  100  200   -   -
>> >> m2   -   -   -   -
>> >> m3   -   -   300   -
>> >> m4   -   -   -   -
>> >> m5   -   -   -   -
>> >>
>> >> thank you !
>> >>
>> >>
>> >> On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com>
>> wrote:
>> >>
>> >>> Reproducible example, please. -- In particular, what exactly does C
>> look
>> >>> ilike?
>> >>>
>> >>> (You should know this by now).
>> >>>
>> >>> -- Bert
>> >>> Bert Gunter
>> >>>
>> >>> "The trouble with having an open mind is that people keep coming along
>> >>> and sticking things into it."
>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>
>> >>>
>> >>> On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com>
>> wrote:
>> >>>> Dear all,
>> >>>>
>> >>>> please could you advise on the R code I could use in order to do the
>> >>>> following operation :
>> >>>>
>> >>>> a. -- I have 2 lists of "genome coordinates" : a list is composed by
>> >>>> numbers that represent genome coordinates;
>> >>>>
>> >>>> let's say list N :
>> >>>>
>> >>>> n1
>> >>>>
>> >>>> n2
>> >>>>
>> >>>> n3
>> >>>>
>> >>>> n4
>> >>>>
>> >>>> and a list M:
>> >>>>
>> >>>> m1
>> >>>>
>> >>>> m2
>> >>>>
>> >>>> m3
>> >>>>
>> >>>> m4
>> >>>>
>> >>>> m5
>> >>>>
>> >>>> 2 -- and a data frame C, where for some pairs of coordinates (n,m)
>> from
>> >>> the
>> >>>> lists above, we have a numerical intensity;
>> >>>>
>> >>>> for example :
>> >>>>
>> >>>> n1; m1; 100
>> >>>>
>> >>>> n1; m2; 300
>> >>>>
>> >>>> The question would be : what is the most efficient R code I could use
>> in
>> >>>> order to integrate the list N, the list M, and the data frame C, in
>> order
>> >>>> to obtain a DATA FRAME,
>> >>>>
>> >>>> -- list N as the columns names
>> >>>> -- list M as the rows names
>> >>>> -- the values in the cells of N * M, corresponding to the numerical
>> >>> values
>> >>>> in the data frame C.
>> >>>>
>> >>>> A little example would be :
>> >>>>
>> >>>>      n1  n2  n3 n4
>> >>>>
>> >>>>      m1  100  -   -   -
>> >>>>
>> >>>>      m2  300  -   -   -
>> >>>>
>> >>>>      m3   -   -   -   -
>> >>>>
>> >>>>      m4   -   -   -   -
>> >>>>
>> >>>>      m5   -   -   -   -
>> >>>> I wrote a script in perl, although i would like to do this in R
>> >>>> Many thanks ;)
>> >>>> -- bogdan
>> >>>>
>> >>>>        [[alternative HTML version deleted]]
>> >>>>
>> >>>> ______________________________________________
>> >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>> PLEASE do read the posting guide http://www.R-project.org/
>> >>> posting-guide.html
>> >>>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list