[R] integrating 2 lists and a data frame in R
Bogdan Tanasa
tanasa at gmail.com
Tue Jun 6 16:46:42 CEST 2017
Thank you David. Using xtabs operation simplifies the code very much, many
thanks ;)
On Tue, Jun 6, 2017 at 7:44 AM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> > On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
> >
> > Hi Bogdan,
> > Kinda messy, but:
> >
> > N <- data.frame(N=c("n1","n2","n3","n4"))
> > M <- data.frame(M=c("m1","m2","m3","m4","m5"))
> > C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"),
> I=c(100,300,400))
> > MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1])))
> > names(MN)<-M[,1]
> > rownames(MN)<-N[,1]
> > C[,1]<-as.character(C[,1])
> > C[,2]<-as.character(C[,2])
> > for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3]
>
> `xtabs` offers another route:
>
> C$m <- factor(C$m, levels=M$M)
> C$n <- factor(C$n, levels=N$N)
>
> Option 1: Zeroes in the empty positions:
> > (X <- xtabs(I ~ m+n , C, addNA=TRUE))
> n
> m n1 n2 n3 n4
> m1 100 300 0 0
> m2 0 0 0 0
> m3 0 0 400 0
> m4 0 0 0 0
> m5 0 0 0 0
>
> Option 2: Sparase matrix
> > (X <- xtabs(I ~ m+n , C, sparse=TRUE))
> 5 x 4 sparse Matrix of class "dgCMatrix"
> n
> m n1 n2 n3 n4
> m1 100 300 . .
> m2 . . . .
> m3 . . 400 .
> m4 . . . .
> m5 . . . .
>
> I wasn't sure if the sparse reuslts of xtabs would make a distinction
> between 0 and NA, but happily it does:
>
> > C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3",
> "m4", "m5"), I=c(100,300,400, NA, 0))
> > C
> n m I
> 1 n1 m1 100
> 2 n2 m1 300
> 3 n3 m3 400
> 4 n3 m4 NA
> 5 n4 m5 0
> > (X <- xtabs(I ~ m+n , C, sparse=TRUE))
> 4 x 4 sparse Matrix of class "dgCMatrix"
> n
> m n1 n2 n3 n4
> m1 100 300 . .
> m3 . . 400 .
> m4 . . . .
> m5 . . . 0
>
> (In the example I forgot to repeat the lines that augmented the factor
> levels so m2 is not seen.
>
> --
> Davod
> >
> >
> > Jim
> >
> > On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:
> >> Dear Bert,
> >>
> >> thank you for your response. here it is the piece of R code : given 3
> data
> >> frames below ---
> >>
> >> N <- data.frame(N=c("n1","n2","n3","n4"))
> >>
> >> M <- data.frame(M=c("m1","m2","m3","m4","m5"))
> >>
> >> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"),
> I=c(100,300,400))
> >>
> >> how shall I integrate N, and M, and C in such a way that at the end we
> have
> >> a data frame with :
> >>
> >>
> >> - list N as the columns names
> >> - list M as the rows names
> >> - the values in the cells of N * M, corresponding to the numerical
> >> values in the data frame C.
> >>
> >> more precisely, the result shall be :
> >>
> >> n1 n2 n3 n4
> >> m1 100 200 - -
> >> m2 - - - -
> >> m3 - - 300 -
> >> m4 - - - -
> >> m5 - - - -
> >>
> >> thank you !
> >>
> >>
> >> On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
> >>
> >>> Reproducible example, please. -- In particular, what exactly does C
> look
> >>> ilike?
> >>>
> >>> (You should know this by now).
> >>>
> >>> -- Bert
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that people keep coming along
> >>> and sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>
> >>>
> >>> On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com>
> wrote:
> >>>> Dear all,
> >>>>
> >>>> please could you advise on the R code I could use in order to do the
> >>>> following operation :
> >>>>
> >>>> a. -- I have 2 lists of "genome coordinates" : a list is composed by
> >>>> numbers that represent genome coordinates;
> >>>>
> >>>> let's say list N :
> >>>>
> >>>> n1
> >>>>
> >>>> n2
> >>>>
> >>>> n3
> >>>>
> >>>> n4
> >>>>
> >>>> and a list M:
> >>>>
> >>>> m1
> >>>>
> >>>> m2
> >>>>
> >>>> m3
> >>>>
> >>>> m4
> >>>>
> >>>> m5
> >>>>
> >>>> 2 -- and a data frame C, where for some pairs of coordinates (n,m)
> from
> >>> the
> >>>> lists above, we have a numerical intensity;
> >>>>
> >>>> for example :
> >>>>
> >>>> n1; m1; 100
> >>>>
> >>>> n1; m2; 300
> >>>>
> >>>> The question would be : what is the most efficient R code I could use
> in
> >>>> order to integrate the list N, the list M, and the data frame C, in
> order
> >>>> to obtain a DATA FRAME,
> >>>>
> >>>> -- list N as the columns names
> >>>> -- list M as the rows names
> >>>> -- the values in the cells of N * M, corresponding to the numerical
> >>> values
> >>>> in the data frame C.
> >>>>
> >>>> A little example would be :
> >>>>
> >>>> n1 n2 n3 n4
> >>>>
> >>>> m1 100 - - -
> >>>>
> >>>> m2 300 - - -
> >>>>
> >>>> m3 - - - -
> >>>>
> >>>> m4 - - - -
> >>>>
> >>>> m5 - - - -
> >>>> I wrote a script in perl, although i would like to do this in R
> >>>> Many thanks ;)
> >>>> -- bogdan
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/
> >>> posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list