[R] brewing stats
paul sorenson (sosman)
sourceforge at metrak.com
Sun Oct 23 13:52:21 CEST 2005
I guess this isn't so much of a help request as a show-and-tell from a
non-statistician homebrewer who has been fumbling around with R. If
nothing else it provides yet another data set. I hope it is not out of
line.
Anyway, the plots I have produced are at
http://brewiki.org/BatchSparge#poll
The polling method is somewhat simple, its just one of those multiple
choice style polls you can create on various web forums.
The poll was prompted by the ongoing claim from fly spargers that
"their" method is more efficient, but I had never seen data to support
that. I thought maybe it was a bit of snobbery.
Maybe they are right. However if I conveniently ignore that annoying
bump on the left of the batch sparge histogram then the two groups start
to look very similar.
I was going to go out on a limb and say I learn heaps from reading the
posts here so please don't ruin my delusion too much if my output
violates all principles of good statistics. OTOH if you can suggest
other cool looking graphs please feel free. The more difficult to
pronounce the names are, the better :-)
The data set is (efficiency is the low end of its bin):
method efficiency count source
fly 95 0 bb
fly 90 0 bb
fly 85 2 bb
fly 80 8 bb
fly 75 13 bb
fly 70 8 bb
fly 65 3 bb
fly 60 0 bb
fly 55 0 bb
batch 95 0 bb
batch 90 0 bb
batch 85 4 bb
batch 80 3 bb
batch 75 15 bb
batch 70 10 bb
batch 65 6 bb
batch 60 7 bb
batch 55 1 bb
And the R code:
# Crunch some stuff with brewboard (and similar polls).
# Data is already tabulated.
x = read.csv("SpargeEff.csv")
# Shift value to centre of bin
x$efficiency = x$efficiency + 2.5
# Ignore rows with no votes (NA), zeros are ok though
y = x[which(!is.na(x$count)),]
r = rep(row.names(y), y$count)
z = y[r,]
z$count = 1
par(mfrow=c(2,2))
barplot(table(z$method), main="number of responses")
barplot(table(z$method, z$efficiency), beside=T, legend=T, main="Mash
efficiency by method", sub="paul sorenson 2005 brewiki.org")
boxplot(z$efficiency ~ z$method, main="Mash efficiency")
z.h = hist(z$efficiency, prob=T, main="Efficiencies,\n all methods
combined", xlab="efficiency")
z.md = max(z.h$density)
lines(density(z$efficiency, bw=3.0), col='blue')
#qqnorm(x$efficiency)
t.test(efficiency ~ method, data=z)
#by(z, z$method, summary)
zs = split(z, z$method)
summary(zs$batch)
summary(zs$fly)
# fit a normal distribution
require(MASS)
z.fit = fitdistr(z$efficiency, 'normal')
q = 55:95 + 2.5
lines(q, dnorm(q, z.fit$estimate['mean'], z.fit$estimate['sd']), col='red')
legend('topleft', legend=c('density', 'fitted'), col=c('blue', 'red'),
lwd=1, inset=0.05)
# Factor out lowball values.
z.f = z[which(z$efficiency >= 65),]
by(z.f, z.f$method, summary)
More information about the R-help
mailing list