[R] Linear regression with a rounded response variable

Jim Lemon drjimlemon at gmail.com
Wed Oct 21 22:25:36 CEST 2015


Hi Ravi,
And remember that the vanilla rounding procedure is biased upward. That is,
an observation of 5 actually may have ranged from 4.5 to 5.4.

Jim

On Thu, Oct 22, 2015 at 7:15 AM, peter salzman <peter.salzmanuser at gmail.com>
wrote:

> here is one thought:
>
> if you plug in your numbers into any kind of regression you will get
> prediction that are real numbers and not necessarily integers, it may be
> that you predictions are good enough with this approximate value of Y. you
> could test this by randomly shuffling your data by +- 0.5 and compare the
> results with the original result.
>
> let me add another idea:
>
> if data is not fully observed this falls under the umbrella of censored
> data, in this case you have interval censoring. if you see 5 then the
> observations is in interval [4.5, 5.5]
> i'm not familiar with the field but i'd search for 'regression with
> interval censoring'
>
>
> peter
>
>
> On Wed, Oct 21, 2015 at 10:53 AM, Ravi Varadhan <ravi.varadhan at jhu.edu>
> wrote:
>
> > Hi,
> > I am dealing with a regression problem where the response variable, time
> > (second) to walk 15 ft, is rounded to the nearest integer.  I do not care
> > for the regression coefficients per se, but my main interest is in
> getting
> > the prediction equation for walking speed, given the predictors (age,
> > height, sex, etc.), where the predictions will be real numbers, and not
> > integers.  The hope is that these predictions should provide unbiased
> > estimates of the "unrounded" walking speed. These sounds like a
> measurement
> > error problem, where the measurement error is due to rounding and hence
> > would be uniformly distributed (-0.5, 0.5).
> >
> > Are there any canonical approaches for handling this type of a problem?
> > What is wrong with just doing the standard linear regression?
> >
> > I googled and saw that this question was asked by someone else in a
> > stackexchange post, but it was unanswered.  Any suggestions?
> >
> > Thank you,
> > Ravi
> >
> > Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg)
> > Associate Professor,  Department of Oncology
> > Division of Biostatistics & Bionformatics
> > Sidney Kimmel Comprehensive Cancer Center
> > Johns Hopkins University
> > 550 N. Broadway, Suite 1111-E
> > Baltimore, MD 21205
> > 410-502-2619
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Peter Salzman, PhD
> Department of Biostatistics and Computational Biology
> University of Rochester
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list