[Rd] Rounding multinomial proportions
arnima at hafro.is
Thu Feb 11 11:26:40 CET 2010
Ugh, I made a typo at the very heart of my message:
"when I preprocess each line in R as p<-a/sum(a), occasionally a line will
sum to 0.999, 1.002, or the like"
"when I preprocess each line in R as p<-round(a/sum(a),3) occasionally a
line will sum to 0.999, 1.002, or the like"
Also, the first paragraph should end with "where the other multinomial
On Thu, 11 Feb 2010, Arni Magnusson wrote:
> I present you with a function that solves a problem that has bugged me
> for many years. I think the problem may be general enough to at least
> consider adding this function, or a revamped version of it, to the
> 'stats' package, with the other multinomial functions reside.
> I'm using R to export data to text files, which are input data for an
> external model written in C++. Parts of the data are age distributions,
> in the form of relative frequency in each year:
> Year Age1 Age2 ... Age10
> 1980 0.123 0.234 ... 0.001
> ... ... ... ... ...
> Each row should sum to exactly 1. The problem is that when I preprocess
> each line in R as p<-a/sum(a), occasionally a line will sum to 0.999,
> 1.002, or the like. This could either crash the external model or lead
> to wrong conclusions.
> I believe similar partitioning is commonly used in a wide variety of
> models, making this a general problem for many modellers.
> In the past, I have checked every line manually, and then arbitrarily
> tweaked one or two values up or down to make the row sum to exactly one,
> but two people would tweak differently. Another semi-solution is to
> write the values to the text file in a very long format, but this would
> (1) make it harder to visually check the numbers and (2) the numbers in
> the article or report would no longer match the data files exactly, so
> other scientists could not repeat the analysis and get the same results.
> Once I implemented a quick and dirty solution, simply setting the last
> proportion (Age10 above) as 1 minus the sum of ages 1-9. I quickly
> stopped using that approach when I started seeing negative values.
> After this introduction, the attached round_multinom.html should make
> sense. The algorithm I ended up choosing comes from allocating seats in
> elections, so I was tempted to provide that application as well,
> although it makes the interface and documentation slightly more
> The working title of this function was a short and catchy vote(), but I
> changed it to round_multinom(), even though it's not matrix-oriented
> like the other *multinom functions. That would probably be
> straightforward to do, but I'll keep it as a vector function during the
> initial discussion.
> I'm curious to hear your impressions and ideas. In the worst case, this
> is a not-so-great solution to a marginal problem. In the best case, this
> might be worth a short note in the Journal of Statistical Software.
> Thanks for your time,
> P.S. In case the mailing list doesn't handle attachments, I've placed
> the same files on http://www.hafro.is/~arnima/ for your convenience.
More information about the R-devel