[R] Labour Statistics
Max
mnevill at exitcheck.net
Wed Oct 15 15:05:47 CEST 2008
Gad Abraham explained :
> Max wrote:
>> Hi everyone,
>>
>> This is not so much of an R question as a statistics question. I currently
>> work for the largest pre employment screening company in Canada. Upper
>> management has noticed that noticed that usually a month or so before any
>> big kind of economic shock happens, that our incoming files (requests for a
>> background check) jump up or down.
>>
>> As the company statistician, they've asked me to see if the relationship is
>> strong enough to put together a product that can be sold to any kind of
>> firm or organization (brokerages or any kind of investing firm, federal
>> ministry of finance, statistics canada (like the bureau of stats in the
>> USA), universities etc)
>>
>> In Canada on the 10th of every month, statistics canada releases labour
>> statistics for the previous month. The way CFO sees it, *ideally* on the
>> (1st to 10th, something like that) every month, the firm I work for could
>> be releasing data for the rest of the month.
>>
>> What I'm trying to figure out is if you were in the position of evaluating
>> the final product for purchase, what kind of information would make the
>> product credible/viable? Summary statistics? Variance covariance matrices?
>> Graphs of the data? Cross Correlation matrices for time series analysis?
>>
>> It's frustrating because I can see a noticeable relationship between our
>> file volume and the unemployment rate (in particular,) but I'm not sure how
>> to appropriately frame it in a way that another statistician/modeler would
>> want the data.
>
> Why not start with some simple plots of the relationships between your
> variables? Once you have a feel for the problem, you can look into modelling
> it more formally using a suitable regression model.
Gad, the issue I have is that I technically have one predictor for
multiple response. The data is not very clean for simple univariate
models. Unfortunately, my knowledge of multivariate response models is
poor, and how to set up the problem in R as a multivariate regression
is a total mystery to me. (Multivariate was the one course that I
wasn't able to take in my undergrad math/stats degree. )
The other issue is that if I view the problem as a time series problem,
it's multiple time series analysis, which I don't have any books on.
The more I look at the data and the problem the more I feel like I'm in
way over my head.
More information about the R-help
mailing list