[R] odd behaviour in quantreg::rq
Dylan Beaudette
debeaudette at ucdavis.edu
Wed Jul 1 23:52:24 CEST 2009
On Wednesday 01 July 2009, roger koenker wrote:
> It's not clear to me whether you are looking for an exploratory tool
> or something more like formal inference. For the former, it seems
> that estimating a few weighted quantiles would be quite useful. at
> least
> it is rather Tukeyesque. While I'm appealing to authorities, I can't
> resist recalling that Galton's "invention of correlation" paper: Co-
> relations
> and their measurement, Proceedings of the Royal Society, 1888-89,
> used medians and interquartile ranges.
>
Thanks Roger.
We are interested in a formal inference type of comparison. I have found that
weighted density plots to be a helpful exploratory graphic-- but ultimately
we would like to (as objectively as possible) determine if distributions are
different.
Unfortunately the distributions are very poorly behaved. Once such example can
be seen here:
http://169.237.35.250/~dylan/temp/ACTIVITY-method_comparision.pdf
I was imagine using something like the number of 'significant' differences in
n-quantiles would be used to determine which treatment group was most
different than the control group.
Cheers,
Dylan
> url: www.econ.uiuc.edu/~roger Roger Koenker
> email rkoenker at uiuc.edu Department of Economics
> vox: 217-333-4558 University of Illinois
> fax: 217-244-6678 Urbana, IL 61801
>
> On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote:
> > Thanks Roger. Your comments were very helpful. Unfortunately, each of
> > the 'groups' in this example are derived from the same set of data,
> > two of
> > which were subsets-- so it is not that unlikely that the weighted
> > medians
> > were the same in some cases.
> >
> > This all leads back to an operation attempting to answer the question:
> >
> > Of the 2 subsetting methods, which one produces a distribution most
> > like the
> > complete data set? Since the distributions are not normal, and there
> > are
> > area-weights involved others on the list suggested quantile-
> > regression. For a
> > more complete picture of how 'different' the distributions are, I
> > have tried
> > looking at the differences between weighted quantiles: (0.1, 0.25,
> > 0.5, 0.75,
> > 0.9) as a more complete 'description' of each distribution.
> >
> > I imagine that there may be a better way to perform this comparison...
> >
> > Cheers,
> > Dylan
> >
> > On Tuesday 30 June 2009, roger koenker wrote:
> >> Admittedly this seemed quite peculiar.... but if you look at the
> >> entrails
> >> of the following code you will see that with the weights the first
> >> and
> >> second levels of your x$method variable have the same (weighted)
> >> median
> >> so the contrast that you are estimating SHOULD be zero. Perhaps
> >> there is something fishy about the data construction that would have
> >> allowed us to anticipate this? Regarding the "fn" option, and the
> >> non-uniqueness warning, this is covered in the (admittedly obscure)
> >> faq on quantile regression available at:
> >>
> >> http://www.econ.uiuc.edu/~roger/research/rq/FAQ
> >>
> >> # example:
> >> library(quantreg)
> >>
> >> # load data
> >> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>
> >> # with weights
> >> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> >> se='ker')
> >>
> >> #Reproduction with more convenient notation:
> >>
> >> X0 <- model.matrix(~method, data = x)
> >> y <- x$sand
> >> w <- x$area_fraction
> >> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")
> >>
> >> #Second reproduction with orthogonal design:
> >>
> >> X1 <- model.matrix(~method - 1, data = x)
> >> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")
> >>
> >> #Comparing f0 and f1 we see that they are consistent!! How can that
> >> be??
> >> #Since the columns of X1 are orthogonal estimation of the 3
> >> parameters
> >> are separable
> >> #so we can check to see whether the univariate weighted medians are
> >> reproducible.
> >>
> >> s1 <- X1[,1] == 1
> >> s2 <- X1[,2] == 1
> >> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
> >> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
> >>
> >> #Now looking at the g1 and g2 objects we see that they are equal and
> >> agree with f1.
> >>
> >>
> >> url: www.econ.uiuc.edu/~roger Roger Koenker
> >> email rkoenker at uiuc.edu Department of Economics
> >> vox: 217-333-4558 University of Illinois
> >> fax: 217-244-6678 Urbana, IL 61801
> >>
> >> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
> >>> Hi,
> >>>
> >>> I am trying to use quantile regression to perform weighted-
> >>> comparisons of the
> >>> median across groups. This works most of the time, however I am
> >>> seeing some
> >>> odd output in summary(rq()):
> >>>
> >>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
> >>> area_fraction)
> >>> Coefficients:
> >>> Value Std. Error t value Pr(>|t|)
> >>> (Intercept) 45.44262 3.64706 12.46007 0.00000
> >>> methodmukey-HRU 0.00000 4.67115 0.00000 1.00000
> >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>
> >>> When I do not include the weights, I get something a little closer
> >>> to a
> >>> weighted comparison of means, along with an error message:
> >>>
> >>> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
> >>> Coefficients:
> >>> Value Std. Error t value Pr(>|t|)
> >>> (Intercept) 44.91579 2.46341 18.23318 0.00000
> >>> methodmukey-HRU 9.57601 9.29348 1.03040 0.30380
> >>> Warning message:
> >>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
> >>>
> >>>
> >>> I have noticed that the error message goes away when specifying
> >>> method='fn' to
> >>> rq(). An example is below. Could this have something to do with
> >>> replication
> >>> in the data?
> >>>
> >>>
> >>> # example:
> >>> library(quantreg)
> >>>
> >>> # load data
> >>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>>
> >>> # with weights
> >>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> >>> se='ker')
> >>>
> >>> # without weights
> >>> # note error message
> >>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
> >>>
> >>> # without weights, no error message
> >>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
More information about the R-help
mailing list