[R] Efficient way to use data frame of indices to initialize matrix
Cutler, Gene
gcutler at amgen.com
Wed Dec 8 18:51:48 CET 2010
Thanks for the three great answers! For those who are curious, I timed the three approaches:
nr <- 15812
nc <- 64636
mymat <- matrix(nrow=nr, ncol=nc)
mymat[1,1] <- 1 # see note below
# mydf is created elsewhere
dim(mydf)
# 10910263 3
colnames(mydf)
# "x" "y" "a"
# approach 1:
# mymat[ mydf$x + (mydf$y-1) * nc ] <- mydf$a
# approach 2:
# mymat[ as.matrix(mydf[,2:1]) ] <- mydf$a
# approach 3:
# mymat[ cbind(mydf$x, mydf$y) ] <- mydf$a
system.time( for (i in 1:10) mymat[ mydf$x + (mydf$y-1) * nc ] <- mydf$a )
system.time( for (i in 1:10) mymat[ as.matrix(mydf$x, mydf$y) ] <- mydf$a )
system.time( for (i in 1:10) mymat[ cbind(mydf$x, mydf$y) ] <- mydf$a )
# user system elapsed
# 10.478 3.837 14.317 <- #1
# 9.064 1.711 10.777 <- #2
# 10.747 2.702 13.450 <- #3
So you can see that approach #2 is the fastest. Note that I found that initializing the new matrix with its first value takes about 8 elapsed seconds all on its own, which is why I have that initialization line above.
--
Gene
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Tuesday, December 07, 2010 11:00 AM
> To: Greg Snow
> Cc: Gene; r-help at r-project.org
> Subject: Re: [R] Efficient way to use data frame of indices to
> initialize matrix
>
>
> On Dec 7, 2010, at 1:49 PM, Greg Snow wrote:
>
> > tmpdf <- data.frame( x = c(1,2,3), y=c(2,3,1), a=c(10,20,30) )
> > mymat <- matrix(0, ncol=3, nrow=3)
> > mymat[ as.matrix(tmpdf[,c('x','y')]) ] <- tmpdf$a
>
> cbind is also useful for assembly of arguments to the matrix-`[<-`
> function:
>
> tmpdf <- data.frame( x = c(1,2,3), y=c(2,3,1), a=c(10,20,30) )
> mymat <- matrix(NA, ncol=max(tmpdf$y), nrow=max(tmpdf$x))
> mymat[ cbind(tmpdf$x,tmpdf$y) ] <- tmpdf$a
>
> mymat
> [,1] [,2] [,3]
> [1,] NA 10 NA
> [2,] NA NA 20
> [3,] 30 NA NA
>
>
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Gene
> >> Sent: Tuesday, December 07, 2010 11:31 AM
> >> To: r-help at r-project.org
> >> Subject: [R] Efficient way to use data frame of indices to
> initialize
> >> matrix
> >>
> >> I have a data frame with three columns, x, y, and a. I want to
> >> create
> >> a matrix from these values such that for matrix m:
> >> m[x,y] == a
> >>
> >> Obviously, I can go row by row through the data frame and insert the
> >> value a at the correct x,y location in the matrix. I can make that
> >> slightly more efficient (perhaps), by doing something like this:
> >>> for (each.x in unique(df$x)) m[each.x, df$y[df$x == each.x]] <-
> >> df$a[df$x == each.x]
> >>
> >> But I feel that there must be a more efficient, or at least more
> >> elegant way to do this.
> >>
> >> --
> >> Gene
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
More information about the R-help
mailing list