[R] Fastest Way to Divide Elements of Row With Its RowSum
William Dunlap
wdunlap at tibco.com
Thu Sep 17 19:02:36 CEST 2009
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Thomas Lumley
> Sent: Thursday, September 17, 2009 6:59 AM
> To: William Revelle
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Fastest Way to Divide Elements of Row With Its RowSum
>
> On Thu, 17 Sep 2009, William Revelle wrote:
>
> > At 2:40 PM +0900 9/17/09, Gundala Viswanath wrote:
> >> I have a data frame (dat). What I want to do is for each row,
> >> divide each row with the sum of its row.
> >>
> >> The number of row can be large > 1million.
> >> Is there a faster way than doing it this way?
> >>
> >> datnorm;
> >> for (rw in 1:length(dat)) {
> >> tmp <- dat[rw,]/sum(dat[rw,])
> >> datnorm <- rbind(datnorm, tmp);
> >> }
> >>
> >>
> >> - G.V.
> >
> >
> > datnorm <- dat/rowSums(dat)
> >
> > this will be faster if dat is a matrix rather than a data.frame.
> >
>
> Even if it's a data frame and he needs a data frame answer it
> might be faster to do
> mat<-as.matrix(dat)
> matnorm<-mat/rowSums(mat)
> datnorm<-as.data.frame(dat)
If the data.frame has many more rows than columns and the
number of rows is large (e.g., dimensions 10^6 x 20) you may
find that you run out of space converting it to a matrix. You
can use much less space by looping over the columns, both
to compute the row sums and to do the division. E.g., the
following should require only 1 (maybe 2) column's worth of
scratch space:
f2 <- function(x){
stopifnot(is.data.frame(x), ncol(x)>=1)
rowsum <- x[[1]]
if(ncol(x)>1) for(i in 2:ncol(x))
rowsum <- rowsum + x[[i]]
for(i in 1:ncol(x))
x[[i]] <- x[[i]] / rowsum
x
}
For a 10^6 by 20 all numeric data.frame this runs in 13 seconds
on my machine but things like x/rowSums(x) run out of memory.
When working with data.frames it generally pays to think a column
at a time instead of a row at a time.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
>
> The other advantage, apart from speed, of doing it with
> dat/rowSums(dat) rather than the loop is he gets the right
> answer. The loop goes from 1 to the number of columns if dat
> is a data frame and 1 to the number of entries if dat is a
> matrix, not from 1 to the number of rows.
>
> -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list