[R-sig-ME] how to model a household effect, when there is no variation in the dependent variable at the household level?

Sat Sep 17 22:31:51 CEST 2011

I am trying to estimate how individual labour-market transitions
effect poverty risks, and to which extent countries differ in this
regard (if there are country specific poverty risks per labour-market
transition).

In this context, poverty is defined as belonging to a household that
has an equivalised disposable income less than 60 per cent of the
median equivalised disposable household-income in the country. Even
with such a relative definition of poverty, there are substantial
country differences in the frequency and distribution of poverty.

For singe-person households, the poverty risk is of course much more
directly related to the individual labour-market status than for
persons in households with two or more adult (or income-generating)
members. (For simplicity, I will use the term labour-market status,
but the variable actually represents categories of labour-market
status over time)

We have data on individual income of all members in the household. For
some countries sampling was done at the individual level, and for
those countries we do not have data on the labour-market status of the
other persons in the household. For most countries though, the
household was the sampling unit, and in these cases we have data on
labour-market status on all members of the household.

My question is: how should I model the household-effect? I think of
the effect as source of distortion on the link between individual
labour-market position and risk of poverty - a link where the
individual income is an intermediate factor.

I think of the problem as two equations.

A: individual.income ~ gender + age + educational.level + (0 + 1 | labour.market.status:country) + error.term
B: poverty ~ individual.income + income.of.other.household.members + total.number.of.adult.household.members + total.number.of.non.adult.household.members

By definition, poverty is completely defined by the terms in B, i.e.
there will be no residuals in B.

In english I would formulate the question like this:

Controlling for the income of other members of your household, and the
number of adult and non-adult members of your household, and
controlling for your gender, age and educational level, how does your
country and labour-market status interact in affecting your risk of
poverty?

I am bit confused on if I can include household.id as simple a random
term, since within households there is no variation in the dependent
variable poverty. I assume that the random effect of each household
would simply explain away all variation in poverty, eg:

Model 0
poverty ~ gender + age + educational.level + (1 | household.id) + (0 + 1 | labour.market.status:country)

I now ask the list if you think the random term in following model
would properly capture the interaction effect of country and labour
market status on poverty.

Model I
poverty ~ gender + age + educational.level + (0 + 1 | labour.market.status:country:household.id)

(I choose (0 + 1 : labour.market.status:country:household.id) rather
than (0 + labour.market.status | country | household.id) in order
reduce the number of variance-covariance matrices, even if I don't
really understand the implications of this. The number of households
is about 100,000, the number of countries about 20 and the number of
labour.market.statuses about 15)

Conditioning on household.id makes sense to me, but there must be
better ways to model the household effect, since the I don't really
want to wait for 100,000 * 20 * 15 conditional modes to be computed.

I guess one could explicitly include the number of members in the
household, and the sum of the other household members income(s) in the
model. The incomes of others in the household affect poverty risk
relative to country median, so this term would have to be scaled
against the country median. Model II includes these data about the
household as three fixed terms.

Model II
poverty ~ gender + age + educational.level + (0 + 1 | labour.market.status:country) + sum.of.other.household.members.income * median.income.in.country + total.number.of.adult.household.members + total.number.of.non.adult.household.members

I would be most grateful for any suggestions on alternative model
specifications, or input on Model I and II above.

Intuitively, I feel a need to explicitly group the individuals into
their household - thus I am not quite satisfied with Model II, but
maybe that is just a logical error due to "level-thinking"?

-- 
Hans Ekbrand
Department of sociology, Gothenburg university