[R] special question on regression
Greg.Snow at imail.org
Mon Jul 18 22:33:33 CEST 2011
I remember seeing an example using the EM algorithm where one of the variables was age of child and they assumed that an age like 16 months was accurate to the month, but ages like 18 months may have been off by as much as 2 months and ages like 3 years could be off by 6 months (or more), so they used the EM algorithm to estimate the actual ages (I think, but am not sure, that age was used as a predictor in the regression). I think the example may be in Little and Rubin's book on missing data, but I could not find it in a quick skim through my copy. But that is one approach.
If you have an exact transformation of your interval data and you are willing to assume multivariate normality (after the transform) then you could use maximum likelihood with the optim (or other) function, just create the likelihood function to take into account the intervals. This would work with something like a log transform, but I don't know how it would work with something like a spline.
Another approach would be a Bayesian regression (use BRugs or similar) where you put a prior distribution on each of the intervals, e.g. if the data you have is 2-4 then maybe use a uniform prior between 2 and 4, etc. This has a similar feel to me to the EM approach, but based on very different theory. One advantage of this is that you also have a posterior distribution on the actual value of each of your values that you only know the interval for. Again this works great with known transforms like log, but I don't know how you would do a spline, you could use a polynomial transformation to start to at least get a feel for the level and general shape of the nonlinearity.
You might also try multiple imputation on the interval data, there are several packages that do the multiple imputation, but I don't know if any of them would take the intervals into account. You could possibly create your own imputations generating randomly within the intervals, then use the existing tools to help with the analysis.
There are a few avenues to investigate, I think I would go with the Bayesian (just don't tell my Bayesian friends that :-), but your preferences could differ.
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
greg.snow at imail.org
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Johannes Radinger
> Sent: Sunday, July 17, 2011 4:02 AM
> To: r-help at r-project.org
> Subject: [R] special question on regression
> Hello R-people!
> I have a general statistical question about regressions. I just want
> to describe my case:
> I have got a dataset of around 150 observations and 1 dependent and 2
> independent variables.
> The dependent variable is of metric nature (in my case meters in a
> range from around 0.5-10000 m). The first independent is also metric
> (in mm ranging from 50-700 mm) and it is assumed to be in a linear
> relation with the dependend one. So that is not a problem at all to
> do a typicall linear regression on that.
> No there is the second independent variable. This is also of metric
> nature and gives information on time (ranging from 1 day to 800 days)
> but here sometimes is this variable not exactly clear, I know for
> example a range (1-2 days) or less than x days etc. So my dataset
> could look like this:
> measured dependent variable in days:
> So my question: Is there a general method to include such types of
> variables into a regression analysis?
> Secondly I assume that there is not a linear relation given, it is
> more of a logarithmic nature so that the influence of the time on the
> dependent variable decreases with increasing size.
> So in short my questions:
> * How can I use variable values like <5 or 4-5 in a regression
> * Is it possible to combine the linear relationship with a
> logarithmic one in a multiple regression
> *How can that be done in R, are there any special packages you'd
> Thank you very much
> best regards
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help