[R] bitwise addition

Marc Schwartz MSchwartz at mn.rr.com
Sun May 14 21:43:17 CEST 2006


On Sat, 2006-05-13 at 23:53 -0400, jim holtman wrote: 
> Using the algorithm in the reference
> www.everything2.com/index.pl?node_id=449702 and the 'bitops' package,
> here is some code that will select the numeric values with a specified
> number of bits.  It takes less than 12 seconds to select from 21-bit
> numbers. If I had more memory, I would expect that for 25-bit numbers
> it would take about 200 seconds to compute the indices.
>  
> > mask1 <- 0x55555555
> > mask2 <- 0x33333333
> > mask4 <- 0x0f0f0f0f
> > mask8 <- 0x00ff00ff
> > mask16 <- 0x0000ffff
> > require(bitops)
> [1] TRUE
> > x <- 0:(2^21 - 1)   # test all number with 21 bits 
> > system.time({
> + x <- bitAnd(x, mask1) + bitAnd(bitShiftR(x, 1), mask1)
> + x <- bitAnd(x, mask2) + bitAnd(bitShiftR(x, 2), mask2)
> + x <- bitAnd(x, mask4) + bitAnd(bitShiftR(x, 4), mask4)
> + x <- bitAnd(x, mask8) + bitAnd(bitShiftR(x, 8), mask8) 
> + x <- bitAnd(x, mask16) + bitAnd(bitShiftR(x, 16), mask16)
> + })
> [1] 10.40  0.87 11.82    NA    NA
> > # print the first 10 numbers
> > which(x == 9)[1:10] - 1  # account for sequence starting at zero 
>  [1]  511  767  895  959  991 1007 1015 1019 1021 1022

Very nice Jim.  I was not aware of the bitops package.

BTW, FWIW:

x <- 0:(2^25 - 1)   

> system.time({
+  x <- bitAnd(x, mask1) + bitAnd(bitShiftR(x, 1), mask1)
+  x <- bitAnd(x, mask2) + bitAnd(bitShiftR(x, 2), mask2)
+  x <- bitAnd(x, mask4) + bitAnd(bitShiftR(x, 4), mask4)
+  x <- bitAnd(x, mask8) + bitAnd(bitShiftR(x, 8), mask8)
+  x <- bitAnd(x, mask16) + bitAnd(bitShiftR(x, 16), mask16)
+  })
[1] 52.803  9.016 67.885  0.000  0.000


> str(x)
num [1:33554432] 0 1 1 2 1 2 2 3 1 2 ...

> object.size(x)
[1] 268435484



That compares to:

  [1] 4.880 0.700 5.706 0.000 0.000

for x <- 0:(2^21 - 1) on my system. So the timing seems less than linear
as the size of 'x' increases.


Now, if Nameeta indeed requires the binary representations of the values
as character vectors as she noted, she will likely have to be cautious
in approaching that process from a RAM and time efficiency perspective.

I tried doing this on my system in one step and quickly ran out of RAM
(2 Gb) and swap (1 Gb) as I watched both go to 0% free on gkrellm. So
this will likely need to be done stepwise to avoid multiple internal
copying of large objects.

Here is one approach.

For example:

# Using  x <- 0:(2^25 - 1)  
> length(which(x == 9))
[1] 2042975


So, first converting the above values to binary, again using
digitsBase() from sfsmisc:

> system.time(z <- digitsBase(which(x == 9) - 1, ndigits = 25))
[1] 22.125  5.189 29.080  0.000  0.000

> str(z)
basedInt [1:25, 1:2042975] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "class")= chr "basedInt"
- attr(*, "base")= num 2

# z is 400 Mb
> object.size(z)
[1] 408595348


# Now paste() the results into single character vectors
> system.time(mat <- matrix(apply(z, 2, paste, collapse = ""), 
              ncol = 1))
[1] 349.762   4.196 391.827   0.000   0.000


> str(mat)
chr [1:2042975, 1] "0000000000000000111111111" ...


# mat is 128 Mb
> object.size(mat)
[1] 130750524


> head(mat, 10)
      [,1]                       
[1,] "0000000000000000111111111"
[2,] "0000000000000001011111111"
[3,] "0000000000000001101111111"
[4,] "0000000000000001110111111"
[5,] "0000000000000001111011111"
[6,] "0000000000000001111101111"
[7,] "0000000000000001111110111"
[8,] "0000000000000001111111011"
[9,] "0000000000000001111111101"
[10,] "0000000000000001111111110"


Note however, that even after forcing gc() I get:

> gc()
           used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  2271012  60.7    7706096  205.8   9405098  251.2
Vcells 93910018 716.5  193159857 1473.7 170025834 1297.2


Before gc(), gkrellm was showing 12% RAM free, after it shows 38% free.

So even with 2 Gb of RAM, the above is pushing the envelope of
practicality I suspect.

HTH,

Marc Schwartz




More information about the R-help mailing list