[R] survival::clogit - how to construct data and use sampling weights

Valerio Leone Sciabolazza @c|@bo|@zz@ @end|ng |rom gm@||@com
Sun Jan 19 18:19:18 CET 2020


Dear list users,
I need some guidelines to run a conditional logistic regression with
fixed effects.

Let me give you some background.

I have a survey where each respondent is asked the same question once
a year for three consecutive years. There is no apriori on the extent
to which choice at time t is affected by the choice at t-1.

There are 4 possible answer to the question, say a, b, c, d.
I want to know to what extent some time-varying characteristics of the
respondent,say x1 and x2, affect her decision to choose one answer
(e.g., a).

I want to take into account some unobserved heterogeneous factor which
might affect the choice of the respondent, by including individual and
time fixed effects.

In addition, I want to take into account the sampling weight
associated to each respondent, which changes over time.

Following, I provide some code to show how I would do this with
survival::clogit.

My questions are: i) am I properly constructing the dataset and using
the field weights?, ii) is it necessary to include the variable "alt"
in the formula?

Can anyone provide any useful guidelines?
Regards,
Valerio Leone Sciabolazza, Ph.D.


set.seed(123)
# number of observations
n <- 99
# number of possible choice
possible_choice <- letters[1:4]
# number of years
years <- 3
# individual (time-varying) characteristics
x1 <- runif(n, 5.0, 70.5)
x2 <- sample(1:n^2, n, replace = F)
# sampling (time-varying) weights
wgt <- runif(n, 0, 1)
# actual choice at time t
actual_choice_year_1 <- possible_choice[sample(1:4, n/3, replace = T,
prob = rep(1/4, 4))]
actual_choice_year_2 <- possible_choice[sample(1:4, n/3, replace = T,
prob = c(0.4, 0.3, 0.2, 0.1))]
actual_choice_year_3 <- possible_choice[sample(1:4, n/3, replace = T,
prob = c(0.2, 0.5, 0.2, 0.1))]
# create dataset
df <- data.frame(choice = c(actual_choice_year_1,
actual_choice_year_2, actual_choice_year_3),
                 x1 = x1, x2 = x2, wgt = wgt,
                 individual_fixed_effect = as.character(rep(1:(n/3), years)),
                 time_fixed_effect = as.character(rep(1:years, each = n/3)),
                 stringsAsFactors = F)
# prepare data for clogit
df <- df[rep(seq_len(nrow(df)), 4), ]
df$alt <- letters[1:4]
df$mode <- df$choice == df$alt
df <- df[order(as.numeric(df$time_fixed_effect), df$alt,
df$individual_fixed_effect), ]

# run regression
my.reg <- clogit(formula(mode ~ alt + x1 + x2 + time_fixed_effect +
                             strata(individual_fixed_effect)),
                 data = df, method="approximate", weights = df$wgt)

summary(my.reg)



More information about the R-help mailing list