[R] Olympics: 200m Men Final

Fri Aug 10 08:48:31 CEST 2012

Continuing on with fun, if silly, analyses: a little voice in my head
suggests a time series model and, rather than putting any thought
into, I'll use some R-goodness.

Setting up the data as Rui provided, we need to add some NA's to
account for WWII:

library(zoo)
golddata.ts <- as.ts(zoo(golddata[,6], order.by = golddata[,1]))
# Here we let Gabor and Achim think about how get the NAs in there smoothly

library(forecast)
golddata.model <- auto.arima(golddata.ts)

# Prof Hyndman has forgotten more about time series than I will ever know

summary(golddata.model) # ARIMA(2,1,0)+drift seems a bit heavy handed,
but that's what it gives
forecast(golddata.model, 1)

  Point Forecast    Lo 80    Hi 80    Lo 95  Hi 95
2012       19.66507 19.23618 20.09396 19.00915 20.321

# But looking at a graph, this seems to have an odd jump up
plot(forecast(golddata.model))

# Maybe we overfit -- let's kill the drift

golddata.model2 <- auto.arima(golddata.ts, allowdrift = FALSE)

summary(golddata.model2) # ARIMA(1,1,0) seems better
plot(forecast(golddata.model2)) # I like the graph more too
forecast(golddata.model, 2)

   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2012       19.56139 19.04134 20.08145 18.76604 20.35674

Not so very good at all, but a little bit of R fun nevertheless ;-)

And in the category of "how good is your prediction when you already
know the answer and don't care at all about statistical rigor", it
seems that "regress on year" might still be winning. Anyone want to
take some splines out for a spin?

Cheers,
Michael

On Thu, Aug 9, 2012 at 11:31 PM, Mark Leeds <markleeds2 at gmail.com> wrote:
> Hi Rui: I hate to sound like a pessimist/cynic and also I should state that
> I didn't look
> at any of the analysis by you or the other person. But, my question, ( for
> anyone who wants to chime in ) is: given that all these olympic 100-200
> meter runners post times that are generally within 0.1-0.3 seconds of each
> other or even less, doesn't it stand to reason that a model, given the
> historical times, is going to predict well. I don't know what the
> statistical term is for this but intuitively, if there's extremely little
> variation in the responses, then there's going to be extremely little
> variation in the predictions and the result is that you won't be too far
> off ever as long as your predictors are not too strange.  !!!!!   ( weight,
> past performances, height, whatever )
>
> Anyone can feel free to chime in and tell me I'm wrong but , if you're
> going to
> do that, I'd appreciate statistical reasoning, even though I don't have
> any. thanks.
>
>
> mark
>
>
>
>
>
>
> On Thu, Aug 9, 2012 at 4:23 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
>> Hello,
>>
>> Have you seen the log-linear prediction of the 100m winning time in R
>> mailed to the list yesterday by David Smith, subject  Revolutions Blog:
>> July roundup?
>>
>> "A log-linear regression in R predicted the gold-winning Olympic 100m
>> sprint time to be 9.68 seconds (it was actually 9.63 seconds):
>> http://bit.ly/QfChUh"
>>
>> The original by Markus Gesmann can be found at
>> http://lamages.blogspot.pt/**2012/07/london-olympics-and-**
>> prediction-for-100m.html<http://lamages.blogspot.pt/2012/07/london-olympics-and-prediction-for-100m.html>
>>
>> I've made the same, just changing the address to the 200m historical data,
>> and the predicted time was 19.27. Usain Bolt has just made 19.32. If you
>> want to check it, the address and the 'which' argument are:
>>
>> url <- "http://www.databasesports.**com/olympics/sport/sportevent.**
>> htm?sp=ATH&enum=120<http://www.databasesports.com/olympics/sport/sportevent.htm?sp=ATH&enum=120>
>> "
>>
>> Plus a change in the graphic functions' y axis arguments to allow for
>> times around the double to be ploted and seen.
>>
>> #
>> # Original by Markus Gesmann:
>> # http://lamages.blogspot.pt/**2012/07/london-olympics-and-**
>> prediction-for-100m.html<http://lamages.blogspot.pt/2012/07/london-olympics-and-prediction-for-100m.html>
>> library(XML)
>> library(drc)
>> url <- "http://www.databasesports.**com/olympics/sport/sportevent.**
>> htm?sp=ATH&enum=120<http://www.databasesports.com/olympics/sport/sportevent.htm?sp=ATH&enum=120>
>> "
>> data <- readHTMLTable(readLines(url), which=3, header=TRUE)
>> golddata <- subset(data, Medal %in% "GOLD")
>> golddata$Year <- as.numeric(as.character(**golddata$Year))
>> golddata$Result <- as.numeric(as.character(**golddata$Result))
>> tail(golddata,10)
>> logistic <- drm(Result~Year, data=subset(golddata, Year>=1900), fct =
>> L.4())
>> log.linear <- lm(log(Result)~Year, data=subset(golddata, Year>=1900))
>> years <- seq(1896,2012, 4)
>> predictions <- exp(predict(log.linear, newdata=data.frame(Year=years)**))
>> plot(logistic,  xlim=c(1896,2012),
>>      ylim=range(golddata$Result) + c(-0.5, 0.5),
>>      xlab="Year", main="Olympic 100 metre",
>>      ylab="Winning time for the 100m men's final (s)")
>> points(golddata$Year, golddata$Result)
>> lines(years, predictions, col="red")
>> points(2012, predictions[length(years)], pch=19, col="red")
>> text(2012 - 0.5, predictions[length(years)] - 0.5,
>> round(predictions[length(**years)],2))
>>
>> Rui Barradas
>>
>> ______________________________**________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.