[R] optimization challenge

Albyn Jones jones at reed.edu
Wed Jan 13 19:18:33 CET 2010


The key idea is that you are building a matrix that contains the
solutions to smaller problems which are sub-problems of the big
problem.  The first row of the matrix SSQ contains the solution for no
splits, ie SSQ[1,j] is just the sum of squares about the overall mean
for reading chapters1 through j in one day.  The iteration then uses
row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
chapters in m-1 days) is part of the overall optimal solution, you
have already computed it, and so don't ever need to recompute it.

       TS = SSQ[m-1,j]+(SSQ1[j+1])

computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) 
breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to
n in 1 day.  j is a vector in the function, and min(TS) is the minimum
over choices of j, ie SSQ[m,n].

At the end, SSQ[128,239] is the optimal value for reading all 239
chapters in 128 days.  That's just the objective function, so the rest
involves constructing the list of optimal cuts, ie which chapters are
grouped together for each day's reading.  That code uses the same
idea... constructing a list of lists of cutpoints.

statisticians should study a bit of data structures and algorithms!

albyn

On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
> WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim).
> 
> Can you describe a little what the algorithm is doing?
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: Albyn Jones [mailto:jones at reed.edu]
> > Sent: Tuesday, January 12, 2010 5:31 PM
> > To: Greg Snow
> > Cc: r-help at r-project.org
> > Subject: Re: [R] optimization challenge
> > 
> > Greg
> > 
> > Nice problem: I wasted my whole day on it :-)
> > 
> > I was explaining my plan for a solution to a colleague who is a
> > computer scientist, he pointed out that I was trying to re-invent the
> > wheel known as dynamic programming.  here is my code, apparently it is
> > called "bottom up dynamic programming".  It runs pretty quickly, and
> > returns (what I hope is :-) the optimal sum of squares and the
> > cut-points.
> > 
> > function(X=bom3$Verses,days=128){
> > # find optimal BOM reading schedule for Greg Snow
> > # minimize variance of quantity to read per day over 128 days
> > #
> > N = length(X)
> > Nm1 = N-1
> > SSQ<- matrix(NA,nrow=days,ncol=N)
> > Cuts <- list()
> > #
> > #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
> > #           for i days on the chapters 1 to j
> > #
> > M = sum(X)/days
> > CS = cumsum(X)
> > SSQ[1,]= (CS-M)^2
> > Cuts[[1]]= as.list(1:N)
> > #
> > for(m in 2:days){
> > Cuts[[m]]=list()
> > #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
> > for(n in m:N){
> >       	    	  CS = cumsum(X[n:1])[n:1]
> >             	  SSQ1 = (CS-M)^2
> > 	    	  j = (m-1):(n-1)
> >             	  TS = SSQ[m-1,j]+(SSQ1[j+1])
> > 		  SSQ[m,n] = min(TS)
> >                   k = min(which((min(TS)== TS)))+m-1
> >                   Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
> > }
> > }
> > list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
> > }
> > 
> > $SSQ
> > [1] 11241.05
> > 
> > $Cuts
> >   [1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
> > 34  37
> >  [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
> > 75  77
> >  [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
> > 105 106
> >  [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
> > 137 138
> >  [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
> > 164 166
> >  [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
> > 196 199
> > [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
> > 234 236
> > [127] 238 239
> > 
> > 
> > 
> > 
> > On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
> > > I have a challenge that I want to share with the group.
> > >
> > > This is not homework (but I may assign it as such if I teach the
> > appropriate class again) and I have found one solution, so don't need
> > anything urgent.  This is more for fun to see if others can find a
> > better solution than I did.
> > >
> > > The challenge:
> > >
> > > I want to read a book in a given number of days.  I want to read an
> > integer number of chapters each day (there are more chapters than
> > days), no stopping part way through a chapter, and at least 1 chapter
> > each day.  The chapters are very non uniform in length (some very
> > short, a few very long, many in between) so I would like to come up
> > with a reading schedule that minimizes the variance of the length of
> > the days readings (read multiple short chapters on the same day, long
> > chapters are the only one read that day).  I also want to read through
> > the book in order (no skipping ahead to combine short chapters that are
> > not naturally next to each other.
> > >
> > > My thought was that the optim function with method="SANN" would be an
> > appropriate approach, but my first couple of tries did not give very
> > good results.  I have since come up with an optim with SANN solution
> > that gives what I consider good results (but I accept that better is
> > possible).
> > >
> > > Below is a data frame with the lengths of the chapters for the book
> > that originally sparked the challenge for me (but the general idea
> > should work for any book).  Each row represents a chapter (in order)
> > with 3 different measures of the length of the chapter.
> > >
> > > For this challenge I want to read the book in 128 days (there are 239
> > chapters).
> > >
> > > I will post my solutions in a few days, but I want to wait so that my
> > direction does not influence people from trying other approaches (if
> > there is something better than optim, that is fine).
> > >
> > > Good luck for anyone interested in the challenge,
> > >
> > > The data frame:
> > >
> > > bom3 <- structure(list(Chapter = structure(1:239, .Label = c("1 Nephi
> > 1",
> > > "1 Nephi 2", "1 Nephi 3", "1 Nephi 4", "1 Nephi 5", "1 Nephi 6",
> > > "1 Nephi 7", "1 Nephi 8", "1 Nephi 9", "1 Nephi 10", "1 Nephi 11",
> > > "1 Nephi 12", "1 Nephi 13", "1 Nephi 14", "1 Nephi 15", "1 Nephi 16",
> > > "1 Nephi 17", "1 Nephi 18", "1 Nephi 19", "1 Nephi 20", "1 Nephi 21",
> > > "1 Nephi 22", "2 Nephi 1", "2 Nephi 2", "2 Nephi 3", "2 Nephi 4",
> > > "2 Nephi 5", "2 Nephi 6", "2 Nephi 7", "2 Nephi 8", "2 Nephi 9",
> > > "2 Nephi 10", "2 Nephi 11", "2 Nephi 12", "2 Nephi 13", "2 Nephi 14",
> > > "2 Nephi 15", "2 Nephi 16", "2 Nephi 17", "2 Nephi 18", "2 Nephi 19",
> > > "2 Nephi 20", "2 Nephi 21", "2 Nephi 22", "2 Nephi 23", "2 Nephi 24",
> > > "2 Nephi 25", "2 Nephi 26", "2 Nephi 27", "2 Nephi 28", "2 Nephi 29",
> > > "2 Nephi 30", "2 Nephi 31", "2 Nephi 32", "2 Nephi 33", "Jacob 1",
> > > "Jacob 2", "Jacob 3", "Jacob 4", "Jacob 5", "Jacob 6", "Jacob 7",
> > > "Enos 1", "Jarom 1", "Omni 1", "Words of Mormon 1", "Mosiah 1",
> > > "Mosiah 2", "Mosiah 3", "Mosiah 4", "Mosiah 5", "Mosiah 6", "Mosiah
> > 7",
> > > "Mosiah 8", "Mosiah 9", "Mosiah 10", "Mosiah 11", "Mosiah 12",
> > > "Mosiah 13", "Mosiah 14", "Mosiah 15", "Mosiah 16", "Mosiah 17",
> > > "Mosiah 18", "Mosiah 19", "Mosiah 20", "Mosiah 21", "Mosiah 22",
> > > "Mosiah 23", "Mosiah 24", "Mosiah 25", "Mosiah 26", "Mosiah 27",
> > > "Mosiah 28", "Mosiah 29", "Alma 1", "Alma 2", "Alma 3", "Alma 4",
> > > "Alma 5", "Alma 6", "Alma 7", "Alma 8", "Alma 9", "Alma 10",
> > > "Alma 11", "Alma 12", "Alma 13", "Alma 14", "Alma 15", "Alma 16",
> > > "Alma 17", "Alma 18", "Alma 19", "Alma 20", "Alma 21", "Alma 22",
> > > "Alma 23", "Alma 24", "Alma 25", "Alma 26", "Alma 27", "Alma 28",
> > > "Alma 29", "Alma 30", "Alma 31", "Alma 32", "Alma 33", "Alma 34",
> > > "Alma 35", "Alma 36", "Alma 37", "Alma 38", "Alma 39", "Alma 40",
> > > "Alma 41", "Alma 42", "Alma 43", "Alma 44", "Alma 45", "Alma 46",
> > > "Alma 47", "Alma 48", "Alma 49", "Alma 50", "Alma 51", "Alma 52",
> > > "Alma 53", "Alma 54", "Alma 55", "Alma 56", "Alma 57", "Alma 58",
> > > "Alma 59", "Alma 60", "Alma 61", "Alma 62", "Alma 63", "Helaman 1",
> > > "Helaman 2", "Helaman 3", "Helaman 4", "Helaman 5", "Helaman 6",
> > > "Helaman 7", "Helaman 8", "Helaman 9", "Helaman 10", "Helaman 11",
> > > "Helaman 12", "Helaman 13", "Helaman 14", "Helaman 15", "Helaman 16",
> > > "3 Nephi 1", "3 Nephi 2", "3 Nephi 3", "3 Nephi 4", "3 Nephi 5",
> > > "3 Nephi 6", "3 Nephi 7", "3 Nephi 8", "3 Nephi 9", "3 Nephi 10",
> > > "3 Nephi 11", "3 Nephi 12", "3 Nephi 13", "3 Nephi 14", "3 Nephi 15",
> > > "3 Nephi 16", "3 Nephi 17", "3 Nephi 18", "3 Nephi 19", "3 Nephi 20",
> > > "3 Nephi 21", "3 Nephi 22", "3 Nephi 23", "3 Nephi 24", "3 Nephi 25",
> > > "3 Nephi 26", "3 Nephi 27", "3 Nephi 28", "3 Nephi 29", "3 Nephi 30",
> > > "4 Nephi 1", "Mormon 1", "Mormon 2", "Mormon 3", "Mormon 4",
> > > "Mormon 5", "Mormon 6", "Mormon 7", "Mormon 8", "Mormon 9", "Ether
> > 1",
> > > "Ether 2", "Ether 3", "Ether 4", "Ether 5", "Ether 6", "Ether 7",
> > > "Ether 8", "Ether 9", "Ether 10", "Ether 11", "Ether 12", "Ether 13",
> > > "Ether 14", "Ether 15", "Moroni 1", "Moroni 2", "Moroni 3", "Moroni
> > 4",
> > > "Moroni 5", "Moroni 6", "Moroni 7", "Moroni 8", "Moroni 9", "Moroni
> > 10"
> > > ), class = "factor"), Words = c(908L, 879L, 1067L, 1262L, 761L,
> > > 202L, 992L, 1221L, 259L, 924L, 1315L, 860L, 1899L, 1284L, 1488L,
> > > 1618L, 2523L, 1217L, 1292L, 698L, 945L, 1506L, 1543L, 1460L,
> > > 1170L, 1300L, 1169L, 895L, 405L, 812L, 2388L, 966L, 338L, 647L,
> > > 587L, 203L, 857L, 370L, 687L, 570L, 587L, 928L, 520L, 134L, 587L,
> > > 891L, 1699L, 1483L, 1461L, 1240L, 804L, 708L, 988L, 426L, 647L,
> > > 719L, 1365L, 619L, 929L, 3758L, 511L, 1242L, 1160L, 734L, 1398L,
> > > 857L, 966L, 2112L, 1117L, 1605L, 740L, 309L, 1555L, 938L, 864L,
> > > 957L, 1271L, 1291L, 1073L, 391L, 1056L, 560L, 650L, 1382L, 978L,
> > > 963L, 1328L, 620L, 1186L, 954L, 837L, 1200L, 1601L, 812L, 1879L,
> > > 1496L, 1433L, 1016L, 1035L, 2787L, 390L, 1443L, 1231L, 1511L,
> > > 1442L, 1466L, 1858L, 1347L, 1526L, 711L, 1010L, 1848L, 1623L,
> > > 1701L, 1279L, 955L, 1871L, 764L, 1509L, 752L, 1665L, 1260L, 575L,
> > > 708L, 2666L, 1478L, 1837L, 855L, 1576L, 699L, 1229L, 2027L, 651L,
> > > 793L, 1153L, 701L, 1229L, 2221L, 1243L, 911L, 1809L, 1572L, 1073L,
> > > 1345L, 1734L, 1682L, 1757L, 1085L, 988L, 1218L, 2177L, 1517L,
> > > 1748L, 489L, 1777L, 854L, 2220L, 593L, 1418L, 599L, 1527L, 1076L,
> > > 2084L, 1797L, 1142L, 1210L, 1480L, 720L, 1561L, 873L, 1855L,
> > > 1241L, 824L, 1025L, 1340L, 769L, 1353L, 1518L, 931L, 1200L, 1132L,
> > > 871L, 965L, 807L, 1434L, 1294L, 834L, 628L, 731L, 913L, 879L,
> > > 1344L, 1233L, 1538L, 1155L, 507L, 415L, 620L, 183L, 793L, 1269L,
> > > 1457L, 394L, 130L, 1949L, 706L, 1262L, 939L, 822L, 1067L, 914L,
> > > 454L, 1710L, 1510L, 901L, 1370L, 1253L, 892L, 226L, 1048L, 899L,
> > > 1231L, 1424L, 1415L, 762L, 1536L, 1219L, 1132L, 1314L, 147L,
> > > 119L, 120L, 153L, 91L, 335L, 1929L, 1064L, 1012L, 1149L), Letters =
> > c(3746L,
> > > 3640L, 4295L, 4968L, 3202L, 777L, 3941L, 4769L, 1075L, 3829L,
> > > 5159L, 3527L, 7874L, 5236L, 6149L, 6555L, 10180L, 4960L, 5399L,
> > > 2767L, 3758L, 6275L, 6391L, 6151L, 4685L, 5267L, 4767L, 3780L,
> > > 1574L, 3261L, 9985L, 4106L, 1332L, 2557L, 2454L, 810L, 3540L,
> > > 1406L, 2668L, 2304L, 2474L, 3715L, 2090L, 533L, 2442L, 3587L,
> > > 7294L, 6363L, 5809L, 4997L, 3197L, 2941L, 4065L, 1743L, 2539L,
> > > 3149L, 5815L, 2765L, 4027L, 15366L, 2091L, 4988L, 4730L, 3149L,
> > > 5906L, 3704L, 4188L, 8675L, 4771L, 6586L, 2965L, 1313L, 6326L,
> > > 3948L, 3489L, 3942L, 5205L, 5309L, 4300L, 1574L, 4559L, 2393L,
> > > 2655L, 5755L, 4070L, 4027L, 5649L, 2651L, 4978L, 3970L, 3610L,
> > > 5112L, 6733L, 3513L, 7990L, 6356L, 6179L, 4340L, 4432L, 11260L,
> > > 1657L, 5913L, 5033L, 6274L, 5835L, 5865L, 7802L, 5776L, 6295L,
> > > 3034L, 4383L, 7842L, 6659L, 6905L, 5104L, 4162L, 7733L, 3321L,
> > > 6241L, 3212L, 6840L, 5294L, 2471L, 2748L, 10762L, 6269L, 7479L,
> > > 3416L, 6477L, 3043L, 4719L, 8764L, 2494L, 3168L, 4790L, 2983L,
> > > 5129L, 9795L, 5043L, 3913L, 7635L, 6693L, 4669L, 5968L, 7508L,
> > > 7271L, 7543L, 4730L, 4095L, 5152L, 9129L, 6395L, 7357L, 2166L,
> > > 7512L, 3544L, 9554L, 2505L, 6171L, 2568L, 6542L, 4781L, 8655L,
> > > 7721L, 4751L, 5134L, 6032L, 3013L, 6536L, 3504L, 7527L, 5106L,
> > > 3607L, 4247L, 5524L, 3404L, 5947L, 6506L, 3891L, 5256L, 4944L,
> > > 3700L, 4006L, 3392L, 5732L, 5185L, 3402L, 2451L, 2962L, 3679L,
> > > 3550L, 5429L, 5083L, 6354L, 4659L, 2155L, 1754L, 2496L, 721L,
> > > 3326L, 4917L, 5856L, 1589L, 564L, 8300L, 2987L, 5317L, 3820L,
> > > 3595L, 4483L, 3664L, 1805L, 6931L, 6225L, 3508L, 5481L, 5007L,
> > > 3623L, 891L, 4265L, 3746L, 5041L, 5823L, 5726L, 3238L, 6443L,
> > > 5053L, 4664L, 5370L, 614L, 459L, 491L, 629L, 351L, 1457L, 7812L,
> > > 4677L, 4305L, 4603L), Verses = c(20, 24, 31, 38, 22, 6, 22, 38,
> > > 6, 22, 36, 23, 42, 30, 36, 39, 55, 25, 24, 22, 26, 31, 32, 30,
> > > 25, 35, 34, 18, 11, 25, 54, 25, 8, 22, 26, 6, 30, 13, 25, 22,
> > > 21, 34, 16, 6, 22, 32, 30, 33, 35, 32, 14, 18, 21, 9, 15, 19,
> > > 35, 14, 18, 77, 13, 27, 27, 15, 30, 18, 18, 41, 27, 30, 15, 7,
> > > 33, 21, 19, 22, 29, 37, 35, 12, 31, 15, 20, 35, 29, 26, 36, 16,
> > > 39, 25, 24, 39, 37, 20, 47, 33, 38, 27, 20, 62, 8, 27, 32, 34,
> > > 32, 46, 37, 31, 29, 19, 21, 39, 43, 36, 30, 23, 35, 18, 30, 17,
> > > 37, 30, 14, 17, 60, 38, 43, 23, 41, 16, 30, 47, 15, 19, 26, 15,
> > > 31, 54, 24, 24, 41, 36, 25, 30, 40, 37, 40, 23, 24, 35, 57, 36,
> > > 41, 13, 36, 21, 52, 17, 34, 14, 37, 26, 52, 41, 29, 28, 41, 19,
> > > 38, 26, 39, 31, 17, 25, 30, 19, 26, 33, 26, 30, 26, 25, 22, 19,
> > > 41, 48, 34, 27, 24, 20, 25, 39, 36, 46, 29, 17, 14, 18, 6, 21,
> > > 33, 39, 9, 2, 49, 19, 29, 22, 23, 24, 22, 10, 41, 37, 43, 25,
> > > 28, 19, 6, 30, 27, 26, 35, 34, 23, 41, 31, 31, 34, 4, 3, 4, 3,
> > > 2, 9, 48, 30, 26, 34)), .Names = c("Chapter", "Words", "Letters",
> > > "Verses"), row.names = c(NA, -239L), class = "data.frame")
> > >
> > >
> > > --
> > > Gregory (Greg) L. Snow Ph.D.
> > > Statistical Data Center
> > > Intermountain Healthcare
> > > greg.snow at imail.org
> > > 801.408.8111
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>



More information about the R-help mailing list