[R] Olympics: 200m Men Final

Fri Aug 10 10:23:51 CEST 2012

Hello,

The main critique, I think, is that we assume a certain type of model 
where the times can decrease until zero. And that they can do so 
linearly. I believe that records can allways be beaten but 40-50 years 
ago times were measured in tenths of a second, now we see a gain in the 
hundreths as extraordinary. So the assumption doesn't seem to be 
completely reasonable.
As for your assumption that little variation in the responses results in 
little variation in the predictions, I would add that that is true but 
given a model only. The predictions can and do vary from model to model 
(obvious). See the logistic model in the same Gesmann work or Michael's 
ARIMA in a response to my post. Three different predicted values with 
variations from model to model in the tenths of a second. The values 
are, resp., 19.61 (Gesmann) and 19.67 and 19.56 (Weylandt).
Maybe the linear model performs well because, like you say, the 
sprinters post times very close to each other and a  straight line is 
not far from what a more complex model would do. I'm not betting on the 
marathon times.

Rui Barradas

Em 10-08-2012 05:31, Mark Leeds escreveu:
> Hi Rui: I hate to sound like a pessimist/cynic and also I should state that
> I didn't look
> at any of the analysis by you or the other person. But, my question, ( for
> anyone who wants to chime in ) is: given that all these olympic 100-200
> meter runners post times that are generally within 0.1-0.3 seconds of each
> other or even less, doesn't it stand to reason that a model, given the
> historical times, is going to predict well. I don't know what the
> statistical term is for this but intuitively, if there's extremely little
> variation in the responses, then there's going to be extremely little
> variation in the predictions and the result is that you won't be too far
> off ever as long as your predictors are not too strange.  !!!!!   ( weight,
> past performances, height, whatever )
>
> Anyone can feel free to chime in and tell me I'm wrong but , if you're
> going to
> do that, I'd appreciate statistical reasoning, even though I don't have
> any. thanks.
>
>
> mark
>
>
>
>
>
>
> On Thu, Aug 9, 2012 at 4:23 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
>> Hello,
>>
>> Have you seen the log-linear prediction of the 100m winning time in R
>> mailed to the list yesterday by David Smith, subject  Revolutions Blog:
>> July roundup?
>>
>> "A log-linear regression in R predicted the gold-winning Olympic 100m
>> sprint time to be 9.68 seconds (it was actually 9.63 seconds):
>> http://bit.ly/QfChUh"
>>
>> The original by Markus Gesmann can be found at
>> http://lamages.blogspot.pt/**2012/07/london-olympics-and-**
>> prediction-for-100m.html<http://lamages.blogspot.pt/2012/07/london-olympics-and-prediction-for-100m.html>
>>
>> I've made the same, just changing the address to the 200m historical data,
>> and the predicted time was 19.27. Usain Bolt has just made 19.32. If you
>> want to check it, the address and the 'which' argument are:
>>
>> url <- "http://www.databasesports.**com/olympics/sport/sportevent.**
>> htm?sp=ATH&enum=120<http://www.databasesports.com/olympics/sport/sportevent.htm?sp=ATH&enum=120>
>> "
>>
>> Plus a change in the graphic functions' y axis arguments to allow for
>> times around the double to be ploted and seen.
>>
>> #
>> # Original by Markus Gesmann:
>> # http://lamages.blogspot.pt/**2012/07/london-olympics-and-**
>> prediction-for-100m.html<http://lamages.blogspot.pt/2012/07/london-olympics-and-prediction-for-100m.html>
>> library(XML)
>> library(drc)
>> url <- "http://www.databasesports.**com/olympics/sport/sportevent.**
>> htm?sp=ATH&enum=120<http://www.databasesports.com/olympics/sport/sportevent.htm?sp=ATH&enum=120>
>> "
>> data <- readHTMLTable(readLines(url), which=3, header=TRUE)
>> golddata <- subset(data, Medal %in% "GOLD")
>> golddata$Year <- as.numeric(as.character(**golddata$Year))
>> golddata$Result <- as.numeric(as.character(**golddata$Result))
>> tail(golddata,10)
>> logistic <- drm(Result~Year, data=subset(golddata, Year>=1900), fct =
>> L.4())
>> log.linear <- lm(log(Result)~Year, data=subset(golddata, Year>=1900))
>> years <- seq(1896,2012, 4)
>> predictions <- exp(predict(log.linear, newdata=data.frame(Year=years)**))
>> plot(logistic,  xlim=c(1896,2012),
>>       ylim=range(golddata$Result) + c(-0.5, 0.5),
>>       xlab="Year", main="Olympic 100 metre",
>>       ylab="Winning time for the 100m men's final (s)")
>> points(golddata$Year, golddata$Result)
>> lines(years, predictions, col="red")
>> points(2012, predictions[length(years)], pch=19, col="red")
>> text(2012 - 0.5, predictions[length(years)] - 0.5,
>> round(predictions[length(**years)],2))
>>
>> Rui Barradas
>>
>> ______________________________**________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>