[R] Sparse (dgCMatrix) Matrix row-wise normalization

Thu May 4 20:13:52 CEST 2017

Hi all ---

I have a large sparse matrix, call it P:
```
 > str(P)
 Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
   ..@ i       : int [1:7868093] 4221 6098 8780 10313 11102 14243 20570
22145 24468 24977 ...
   ..@ p       : int [1:7357] 0 0 269 388 692 2434 3662 4179 4205 4256 ...
   ..@ Dim     : int [1:2] 1303967 7356
   ..@ Dimnames:List of 2
   .. ..$ : NULL
   .. ..$ : NULL
   ..@ x       : num [1:7868093] 1 1 1 1 1 1 1 1 1 1 ...
   ..@ factors : list()
```

I'd like to row-normalize (say, with the L-2 norm)... the straight-forward
approach would be something like:
```
> row_normalized_P <- P / rowSums(P^2)
```

But this causes a memory allocation error, since it appears the `rowSums`
result is being recycled (appropriately) into a _dense_ matrix with
dimensions equal to `dim(P)`.
Given that P is known to be sparse (or at the very least is stored in
sparse format), does anyone know of a non-iterative approach to achieve the
desired `row_normalized_P` shown above?
(I.e. the resultant matrix will be equally sparse as P itself... and I'd
like to avoid ever having a dense matrix (apart from the rowSums vector)
allocated during the normalization steps.)

The only semi-efficient method I've found around this is to `apply` across
rows (more accurately through blocks of rows coerced into dense
sub-matrices of P), but I'd like to try to remove the looping logic from my
codebase if I can, and I'm wondering if perhaps there's a built-in in the
Matrix package (that I'm just not aware of) that helps with this particular
type of computation.

Cheers and thanks for any help!

-murat

	[[alternative HTML version deleted]]