[R] Conditional logistic regression for "events/trials" format

Charles C. Berry cberry at tajo.ucsd.edu
Fri Jun 1 20:42:01 CEST 2007


On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:

> Thanks for your reply Charles. I do indeed have other variables. I
> apologize for being vague, here is my study in more detail:
>
> I have a cohort of births. My outcome is a dichotomous variable for
> presence/absence of a birth defect. For each cohort member I estimate
> the date of conception, and assign a pollution level during the relevant
> period of gestation. All cohort members conceived on the same day are
> assigned the same pollution level. These cohort members also have a
> covariate, t, which indicates the day of follow-up. For example, if the
> first day of my study is Jan 1, 1987, the data would look like:
>
> Date			t	Conceptions		Cases
> Pollution	Stratum
> Jan 1, 1987		1	100			1
> 10		1
> Jan 2, 1987		2	105			0
> 8		2
> Jan 3, 1987		3	101			1
> 11		3
> .
> .
> Jan 1, 1988		366	109			1
> 13		1
> Jan 2, 1988		367	111			2
> 19		2
> Jan 3, 1988		368	103			0
> 14		3
> .
> .
> .
>
> I make matched pairs of days (Strata) to control for the influence of
> season. I also want to account for long-term trends, eg increasing birth
> defects ascertainment and decreasing pollution levels over time, so I
> want to fit a cubic spline using the variable t.
>

Rather than matching, you might control for season by fitting a periodic 
spline of your 'Stratum' variable. If you do that, then a generalized 
additive logistic regression model could be used.


Something like

fit <- gam( cbind( Cases, Conceptions - Cases ) ~ te( Stratum, bs="cc" ) +
 		te( t, bs="cs" ) + Pollution, your.data.frame,
 		family=binomial )

see ?gam, ?te



> I have already analyzed this data as a time series (I don't use the
> Stratum variable in the time-series analyses), but now I am exploring
> some alternatives. My full dataset has 3,115 strata.
>
> So my final model would look like: clogit(Cases/Conceptions ~ Pollution
> + f(t) + strata(Stratum)).
>
> So, just to reiterate, my goal is to make this model without having to
> bring in the individual-level data. I would be just as happy to do a
> conditional Poisson as I would be to do a conditional logistic
> regression - either would seem to be appropriate here - if that opens up
> some other options.
>
> Thanks very much for your time and interest,
> Matt Strickland
> Epidemiologist
> Birth Defects Branch
> U.S. Centers for Disease Control and Prevention
>
>
>
> -----Original Message-----
> From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu]
> Sent: Thursday, May 31, 2007 1:12 PM
> To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
> Cc: r-help at stat.math.ethz.ch; tlumley at u.washington.edu
> Subject: Re: [R] Conditional logistic regression for "events/trials"
> format
>
> On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:
>
>> Dear R users,
>>
>> I have a large individual-level dataset (~700,000 records) which I am
>> performing a conditional logistic regression on. Key variables include
>
>> the dichotomous outcome, dichotomous exposure, and the stratum to
>> which each person belongs.
>>
>> Using this individual-level dataset I can successfully use clogit to
>> create the model I want. However reading this large .csv file into R
>> and running the models takes a fair amount of time.
>>
>> Alternatively, I could choose to "collapse" the dataset so that each
>> row has the number of events, number of individuals, and the exposure
>> and stratum. In SAS they call this the "events/trials" format. This
>> would make my dataset much smaller and presumably speed things up.
>>
>
> I think you have described the data for forming a 2 by 2 by K table of
> counts.
>
> In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not
> too large - glm(... , family=poisson)  would be suitable.
>
> But you say 'models' above suggesting that there are some other
> variables. If so, you need to be a bit more specific in describing your
> setup.
>
>
>> So my question is: can I use clogit (or possibly another function) to
>> perform a conditional logistic regression when the data is in this
>> "events/trials" format? I am using R version 2.5.0.
>>
>> Thank you very much,
>> Matt Strickland
>> Birth Defects Branch
>> U.S. Centers for Disease Control
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                        (858) 534-2098
>                                          Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu	         UC San Diego
> http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901



More information about the R-help mailing list