[R] Package or procedure recommendations for analysis of repeated cross-sections?
andrewH
ahoerner at rprogress.org
Tue Jul 26 03:26:22 CEST 2011
I have a survey data set of 6 years and about 1500 persons surveyed per year,
with roughly 200 questions per survey. The samples are drawn independently
without replacement and are intended to represent the nation (USA).
I would like to create something like a synthetic panel, dividing the
respondents up into groups and then seeing if year to year changes in the
mean value of my independent variable for each group varies with the level
or the change in the group mean of my explanatory variable.
The grouping would be based on several factors the levels of which denote
demographic variables such as income, race, and birth cohort. Each group
would consist of all those respondents that are identical in their level of
all the selected factors, i.e., it would consistent of all the respondents
in the sample who share an identical race, income level, birth cohort, etc.
After being imported from an SPSS data set, these variables are implemented
as R factors. My dependent variables are measures of ideology and party
affiliation; the variables that identify the groups are factors known to be
correlated to political ideology for which I wish to control; and my
independent variables focus on sources of news and information. My
hypothesis is that the change in ideology we have observed over the period
for which I have data can be explained in part by changes how these groups
get their information. I’m not sure if the ideology change should respond
to the level or to the change in level my independent variable. I intend to
test both.
I was about to try to write this from scratch, but it occurred to me that
this is a variety of problem for which a nice package probably already
exists, and I could probably find it if I knew the right terminology. I am
not enough of a statistician to know the conventional name for the procedure
of using subgroupings of cross-sections repeated over time as if they were
panels. Moreover, I suspect my procedure of dividing a population into
groups based on each combination of the classifying variables has a
conventional name, and that looking at differences or ratios of the means of
an independent variable over those groups and how they respond to the mean
level of an independent variable by group has a name, and that each has one
or more good implementation in R.
Finally, I was thinking of simply regressing changes in the group means of
my independent variable on the group means or changes in the group means of
my independent variable. But this throws away information that I know is
relevant, though I am not sure how best to use it, e.g. that the groups are
of different sizes, so the mean differences or ratios will differ in their
variances. I could assume they are normal and do a correction for
heteroskedasticity, but if there is a better approach, I’d rather use it.
My apologies if this question is unduely basic. I did two semesters of
graduate econometrics once, but that was more than a decade ago, and I fear
that, like many with a superficial knowledge of econometrics, I tend to see
every research question in terms of OLS or GLM, even if that is not the
right model for the problem.
Any help or suggestions would be greatly appreciated.
Sincerely, andrewH
--
View this message in context: http://r.789695.n4.nabble.com/Package-or-procedure-recommendations-for-analysis-of-repeated-cross-sections-tp3694587p3694587.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list