[R] expand.grid game

baptiste auguie baptiste.auguie at googlemail.com
Sat Dec 19 20:41:23 CET 2009


This problem does yield some interesting and unexpected distributions.
Here is another, the number of positive cases as a function of number
of digits (8 in the original question) and of test value (17).

maxi <- 9
N <- 5
test <- 17

foo <- function(N=2, test=1){
sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi),
N-1)))) == test)
}

library(plyr)
N <- seq(1, 6)
maxi <- 9*N
params <- data.frame(from=N, to=maxi+1)
params2 <- mlply(params, seq)

NN <- lapply(seq_along(params2), function(x) rep(x, length(params2[[x]])))

params3 <- data.frame(N=do.call(c, NN), test=do.call(c, params2))

results <- mdply(params3, foo)

library(ggplot2)
p <-
qplot(test, V1, data=results, geom="path")+
 facet_wrap(~N, scales="free") +
 scale_x_continuous(expand=c(0, 0))+
 xlab("test") + ylab("number of match")

p


Best,

baptiste

[David: sorry for the duplicate, i initially sent an attachment that
was too large for the list]

> 2009/12/19 David Winsemius <dwinsemius at comcast.net>:
>>
>> On Dec 19, 2009, at 2:10 PM, David Winsemius wrote:
>>
>>>
>>> On Dec 19, 2009, at 1:36 PM, baptiste auguie wrote:
>>>
>>>> 2009/12/19 David Winsemius <dwinsemius at comcast.net>:
>>>>>
>>>>> On Dec 19, 2009, at 9:06 AM, baptiste auguie wrote:
>>>>>
>>>>>> Dear list,
>>>>>>
>>>>>> In a little numbers game, I've hit a performance snag and I'm not sure
>>>>>> how to code this in C.
>>>>>>
>>>>>> The game is the following: how many 8-digit numbers have the sum of
>>>>>> their digits equal to 17?
>>>>>
>>>>> And are you considering the number "00000089" to be in the acceptable
>>>>> set?
>>>>> Or is the range of possible numbers in 10000079:98000000 ?
>>>>>
>>>>
>>>> The latter, the first digit should not be 0. But if you have an
>>>> interesting solution for the other case, let me know anyway.
>>>>
>>>> I should also stress that this is only for entertainment and curiosity's
>>>> sake.
>>>
>>> The sequence of numbers whose digit sum is 13 is one of the ATT integer
>>> sequences:
>>>
>>> http://www.research.att.com/~njas/sequences/A143164
>>>
>>> No R implementations there, only Maple and Mathematica. Rather than doing
>>> strsplit()'s, I thought that a mathematical approach would be faster for a
>>> sumdigits function:
>>>
>>> sumdigits <- function(x) { s=0; for (i in 1:(1+floor(log(x, base=10))) ){
>>>                                        s<-s+x%%10; x<-x%/%10}
>>>                           return(s) }
>>>
>>> # what follows would be expected to take roughly 1/100th  and 1/50th of
>>> the time of a complete enumeration but is useful for estimating the size of
>>> the result and the time of an exhaustive search:
>>>
>>> > system.time( for (i in 10000079:11000079) if (sumdigits(i)==17)
>>> > {idx<-c(idx,i)})
>>>  user  system elapsed
>>> 30.997   3.516  34.403
>>>
>>> > system.time( for (i in 10000079:12000079) if (sumdigits(i)==17)
>>> > {idx<-c(idx,i)})
>>>  user  system elapsed
>>> 55.975   2.433  58.218
>>>
>>> > head(idx)
>>> [1] 10000079 10000088 10000097 10000169 10000178 10000187
>>>
>>> > length(idx)
>>> [1] 31572
>>>
>>> So it looks as though an exhaustive strategy would take a bit under an
>>> hour on my machine (a 2 year-old device) and be a vector around 1578600
>>> elements in length. (Takes very little memory, and would undoubtedly be
>>> faster if I could use more than one core.)
>>
>> I was assuming that it would be a relatively constant density of numbers in
>> that range which can be seen to be false by examining a density plot of
>> roughly the first 5% of that range.
>>
>>> system.time(  for (i in 10000079:14000079) if (sumdigits(i)==17)
>>> {idx<-c(idx,i)})
>>   user  system elapsed
>> 113.074   5.764 118.391
>>> plot(density(idx))
>>
>> Plot attached:
>>
>>
>>>
>>>>
>>>> baptiste
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>>
>




More information about the R-help mailing list