[R] Trying something for fun...

Charles C. Berry cberry at tajo.ucsd.edu
Sun Aug 23 19:43:58 CEST 2009


On Sun, 23 Aug 2009, Charles C. Berry wrote:

> On Sat, 22 Aug 2009, Noah Silverman wrote:
>
>>  And, of course that leads me to another question...
>>
>>  With svm {e1071} I can ask the predict function to give me probabilities
>>  with lrm {Desigh} I can ask  the predict function to give me probabilities
>>
>>  I can't see how to do this with clogit.
>>
>>  Would someone be kind enough to explain the output options.  (I can see
>>  one that is a probability option.)
>
>
> Well, in the absence of bugs in predict.coxph you could do something like
>
> 	 fit <- clogit( winner ~ strata( heat ) + x )
> 	 new.preds <- predict( fit ,newdata=newdat, type = 'expected')
>
> but this fails for survival_2.35-4. (IIRC, the maintainer knows this and 
> there was recent correspondence here or on R-devel about this bug)
>
> So you will have to work around this.
>
> Something like
>
> 	 clogit.response <- function(x) Surv( I( rep(1, length(x)) ), x )
> 	 fit <- coxph( clogit.response( winner ) ~ strata( heat ) + x )
> 	 new.preds <- ave(
> 			 predict( fit ,newdata=newdat, type = 'risk'),
> 			 newdat$heat, FUN=prop.table )
>
> Ought to do it.

Oops.

Forgot to mention that you need a dummy placeholder for 'winner' in the 
newdat data.frame. Something like

 	newdat$winner <- newdat$heat

should fix it.

>
> HTH,
>
> Chuck
>
>
>>
>>  Thanks!!
>>
>>  -N
>> 
>>
>>  On 8/22/09 10:57 AM, Charles C. Berry wrote:
>> >   On Fri, 21 Aug 2009, Noah Silverman wrote:
>> > 
>> > >   Hi,
>> > > 
>> > >   For fun, I'm trying to throw some horse racing data into either an 
>> > >   svm or lrm model.  Curious to see what comes out as there are so many 
>> > >   published papers on this.
>> > > 
>> > >   One thing I don't know how to do is to standardize the probabilities 
>> > >   by race.
>> > 
>> > 
>> >   This sounds closer to the conditional logit model.
>> > 
>> >   However, if I recall correctly there is an assumption that in the 
>> >   models
>> >   of choice literature is stated something like 'independence of
>> >   alternatives that are unavailable'.  That assumption might not hold in 
>> >   a
>> >   horse race where the speed at which a horse runs may depend on what 
>> >   horses
>> >   she is running against.
>> > 
>> >   See
>> > 
>> >       ?survival:::clogit
>> > 
>> >   and
>> > 
>>> @article{mcfadden1974conditional,
>> >     title={{Conditional logit analysis of qualitative choice behavior}},
>> >     author={McFadden, D.},
>> >     journal={Frontiers in econometrics},
>> >     volume={8},
>> >     pages={105--142},
>> >     year={1974}
>>> }
>> > 
>> > 
>> >   BTW, Professor McFadden has a quintessentially American biography:
>> > 
>> >    http://nobelprize.org/nobel_prizes/economics/laureates/2000/mcfadden-autobio.html 
>> > 
>> > 
>> >   He mentions his personal background in farming and awards won for his
>> >   'sheep and geese', but alas does not mention horses or racing.
>> > 
>> >   HTH,
>> > 
>> >   Chuck
>> > 
>> > > 
>> > >   For example, if I train an LRM on a bunch of variable I get a model. 
>> > >   I can then get probability predictions from the model.  That works.
>> > > 
>> > >   It seems to me, that for a given race (8-12 horses) the probabilites 
>> > >   of my predictions should sum to one.
>> > > 
>> > >   1) Is there some way to train the LRM to evaluate and then model the 
>> > >   subsequent date "per race"??  (Perhaps indicate some kind of grouping 
>> > >   variable?
>> > > 
>> > >   2) Alternately, if I just run my data through a "standard" LRM, is 
>> > >   there some way to then "normalize" the probabilities in a correct way 
>> > >   for each upcoming race?
>> > > 
>> > >   I've done some extensive research in this area and would be willing 
>> > >   to discuss more details offline with someone if they could contribute 
>> > >   to the process.
>> > > 
>> > >   Thanks!!
>> > > 
>> > >   -N
>> > > 
>> > >   ______________________________________________
>> > >   R-help at r-project.org mailing list
>> > >   https://stat.ethz.ch/mailman/listinfo/r-help
>> > >   PLEASE do read the posting guide 
>> > >   http://www.R-project.org/posting-guide.html
>> > >   and provide commented, minimal, self-contained, reproducible code.
>> > > 
>> > > 
>> > 
>> >   Charles C. Berry                            (858) 534-2098
>> >                                               Dept of Family/Preventive
>> >   Medicine
>> >   E mailto:cberry at tajo.ucsd.edu                UC San Diego
>> >   http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
>> >   92093-0901
>> > 
>> > 
>> 
>> 
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive 
> Medicine
> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list