[Rd] bug (PR#13570)

Benjamin Tyner btyner at gmail.com
Fri Mar 6 02:24:21 CET 2009


Hi

Nice to hear from you Ryan. I also do not have the capability to debug 
on windows; however, there is a chance that the behavior you are seeing 
is caused by the following bug noted in my thesis (available on 
ProQuest; email me if you don't have access):

"When lambda = 0 there are no local slopes to aid the blending 
algorithm, yet the
interpolator would still assume they were available, and thus use 
arbitrary values
from memory. This had implications for both fit and tr[L] computation. 
In the
updated code these are set equal to zero which seems the best automatic 
rule when
lambda = 0." [lambda refers to degree]

I submitted a bug fix to Eric Grosse, the maintainer of the netlib 
routines; the fixed lines of fortran are identified in the comments at 
(just search for my email address):

http://www.netlib.org/a/loess

These fixes would be relatively simple to incorporate into R's version 
of loessf.f

Alternatively, a quick check would be for someone to compile the source 
package at https://centauri.stat.purdue.edu:98/loess/loess_0.4-1.tar.gz 
and test it on windows. Though this package incorporates this and a few 
other fixes, please be aware that it the routines are converted to C and 
thus there is a slight performance hit compared to the fortran.

Hope this helps,
Ben

Ryan Hafen wrote:
> That is true - good point.
>
> lp1 <- predict(loess(y ~ x, degree=0))
> lp2 <- predict(loess(y ~ x, degree=0, 
> control=loess.control(surface="direct")))
> sort(abs(lp1-lp2))
>
> It appears that the interpolating fit is correct at the vertices.  I 
> know when degree>=1, the interpolation uses the slopes of the local 
> fits to get a better approximation.  Perhaps it's still trying to do 
> this with degree=0 but the slopes aren't available.  And we have just 
> been lucky in the past with uninitialized values?  If this is the 
> problem it would probably be very simple to fix and I'd love to see 
> degree=0 stay.  I will see if I can figure it out.
>
>
> On Mar 5, 2009, at 6:01 PM, Greg Snow wrote:
>
>> I see the same problem on Windows XP.
>>
>> But if I run loess with surface='direct' then the results are 
>> correct.  So it looks like the problem comes from the 
>> smoothing/interpolating, not the main loess algorithm.
>>
>> -- 
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>>
>>
>>> -----Original Message-----
>>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
>>> project.org] On Behalf Of Ryan Hafen
>>> Sent: Thursday, March 05, 2009 7:43 AM
>>> To: Prof Brian Ripley
>>> Cc: Uwe Ligges; Berwin A Turlach; r-devel at stat.math.ethz.ch; Peter
>>> Dalgaard
>>> Subject: Re: [Rd] bug (PR#13570)
>>>
>>>
>>> On Mar 5, 2009, at 7:59 AM, Prof Brian Ripley wrote:
>>>
>>>> On Thu, 5 Mar 2009, Peter Dalgaard wrote:
>>>>
>>>>> Prof Brian Ripley wrote:
>>>>>> Undortunately the example is random, so not really reproducible
>>>>>> (and I
>>>>>> see nothing wrong on my Mac). However, Linux valgrind on R-devel is
>>>>>> showing a problem:
>>>>>>
>>>>>> ==3973== Conditional jump or move depends on uninitialised value(s)
>>>>>> ==3973==    at 0xD76017B: ehg141_ (loessf.f:532)
>>>>>> ==3973==    by 0xD761600: lowesa_ (loessf.f:769)
>>>>>> ==3973==    by 0xD736E47: loess_raw (loessc.c:117)
>>>>>>
>>>>>> (The uninitiialized value is in someone else's code and I suspect
>>>>>> it was
>>>>>> either never intended to work or never tested.)  No essential
>>>>>> change has
>>>>>> been made to the loess code for many years.
>>>>>>
>>>>>> I would not have read the documentation to say that degree = 0 was
>>> a
>>>>>> reasonable value. It is not to my mind 'a polynomial surface', and
>>>>>> loess() is described as a 'local regression' for degree 1 or 2 in
>>>>>> the
>>>>>> reference.  So unless anyone wants to bury their heads in that
>>>>>> code I
>>>>>> think a perfectly adequate fix would be to disallow degree = 0.
>>>>>> (I vaguely recall debating allowing in the code ca 10 years ago.)
>>>>>
>>>>> The code itself has
>>>>>
>>>>>  if (!match(degree, 0:2, 0))
>>>>>      stop("'degree' must be 0, 1 or 2")
>>>>>
>>>>> though. "Local fitting of a constant" essentially becomes kernel
>>>>> smoothing, right?
>>>>
>>>> I do know the R code allows it: the question is whether it is worth
>>>> the effort of finding the problem(s) in the underlying c/dloess
>>>> code, whose manual (and our reference) is entirely about 1 or 2.  I
>>>> am concerned that there may be other things lurking in the degree=0
>>>> case if it was never tested (in the netlib version: I am sure it was
>>>> only minmally tested through my R interface).
>>>>
>>>> I checked the original documentation on netlib and that says
>>>>
>>>> 29      DIM     dimension of local regression
>>>>               1               constant
>>>>               d+1             linear   (default)
>>>>               (d+2)(d+1)/2    quadratic
>>>>               Modified by ehg127 if cdeg<tdeg.
>>>>
>>>> which seems to confirm that degree = 0 was intended to be allowed,
>>>> and what I dimly recall from ca 1998 is debating whether the R code
>>>> should allow that or not.
>>>>
>>>> If left to me I would say I did not wish to continue to support
>>>> degree = 0.
>>>
>>> True.  There are plenty of reasons why one wouldn't want to use
>>> degree=0 anyway.  And I'm sure there are plenty of other simple ways
>>> to achieve the same effect.
>>>
>>> I ran into the problem because some code I'm planning on distributing
>>> as part of a paper submission "blends" partway down to degree 0
>>> smoothing at the endpoints to reduce the variance.  The only bad
>>> effect of disallowing degree 0 is for anyone with code depending on
>>> it, although there are probably few that use it and better to disallow
>>> than to give an incorrect computation.  I got around the problem by
>>> installing a modified loess by one of Cleveland's former students:
>>> https://centauri.stat.purdue.edu:98/loess/
>>>  (but don't want to require others who use my code to do so as well).
>>>
>>> What is very strange to me is that it has been working fine in
>>> previous R versions (tested on 2.7.1 and 2.6.1) and nothing has
>>> changed in the loess source but yet it is having problems on 2.8.1.
>>> Would this suggest it not being a problem with the netlib code?
>>>
>>> Also strange that it reportedly works on Linux but not on Mac or
>>> Windows.  On the mac, the effect was much smaller. With windows, it
>>> was predicting values like 2e215 whereas on the mac, you would almost
>>> believe the results were legitimate if you didn't think about the fact
>>> that a weighted moving average involving half the data shouldn't
>>> oscillate so much.
>>>
>>> If the consensus is to keep degree=0, I'd be happy to help try to find
>>> the problem or provide a test case or something.  Thanks for looking
>>> into this.
>>>
>>> Ryan
>>>
>>>
>>>
>>>>>
>>>>>
>>>>>> On Thu, 5 Mar 2009, Uwe Ligges wrote:
>>>>>>
>>>>>>> Berwin A Turlach wrote:
>>>>>>>> G'day Peter,
>>>>>>>>
>>>>>>>> On Thu, 05 Mar 2009 09:09:27 +0100
>>>>>>>> Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
>>>>>>>>
>>>>>>>>> rhafen at stat.purdue.edu wrote:
>>>>>>>>>> <<insert bug report here>>
>>>>>>>>>>
>>>>>>>>>> This is a CRITICAL bug!!!  I have verified it in R 2.8.1 for
>>> mac
>>>>>>>>>> and for windows.  The problem is with loess degree=0 smoothing.
>>>>>>>>>> For example, try the following:
>>>>>>>>>>
>>>>>>>>>> x <- 1:100
>>>>>>>>>> y <- rnorm(100)
>>>>>>>>>> plot(x, y)
>>>>>>>>>> lines(predict(loess(y ~ x, degree=0, span=0.5)))
>>>>>>>>>>
>>>>>>>>>> This is obviously wrong.
>>>>>>>>> Obvious? How? I don't see anything particularly odd (on Linux).
>>>>>>>>
>>>>>>>> Neither did I on linux; but the OP mentioned mac and windows. On
>>>>>>>> windows, on running that code, the lines() command added a lot of
>>>>>>>> vertical lines; most spanning the complete window but some only
>>>>>>>> part.
>>>>>>>> Executing the code a second time (or in steps) gave sensible
>>>>>>>> results. My guess would be that some memory is not correctly
>>>>>>>> allocated or
>>>>>>>> initialised.  Or is it something like an object with storage mode
>>>>>>>> "integer" being passed to a double?  But then, why doesn't it
>>>>>>>> show on
>>>>>>>> linux?
>>>>>>>>
>>>>>>>> Happy bug hunting.  If my guess is correct, then I have no idea
>>>>>>>> how to
>>>>>>>> track down such things under windows.....
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>   Berwin
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>>>
>>>>>>> Please can you folks try under R-devel (to be R-2.9.0 in a couple
>>>>>>> of
>>>>>>> weeks) and report if you still see it. I do not under R-devel
>>>>>>> (but do
>>>>>>> under R-release), so my guess is that something called by loess()
>>>>>>> has
>>>>>>> been fixed in the meantime.
>>>>>>>
>>>>>>> Moreover it is not the plot stuff that was wrong under R-2.8.1
>>>>>>> (release) but the loess computations.
>>>>>>>
>>>>>>> Uwe Ligges
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>>>> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
>>>>> 35327918
>>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45)
>>>>> 35327907
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>




More information about the R-devel mailing list