[Rd] predict.loess() segfaults for large n?

Hiroyuki Kawakatsu hkawakat at gmail.com
Fri Mar 1 12:27:17 CET 2013


Hi,

I am segfaulting when using predict.loess() (checked with r62092).
I've traced the source with the help of valgrind (output pasted
below) and it appears that this is due to int overflow when
allocating an int work array in loess_workspace():

    liv = 50 + ((int)pow((double)2, (double)D) + 4) * nvmax + 2 * N;

where liv is an (global) int. For D=1 (one x variable), this
overflows at approx N = 4089 where N is the fitted sample size (not
prediction sample size).

I am aware that you are in the process of introducing long vectors
but a quick fix would be to error when predict.loess(..., se=TRUE)
and N is too large. (Ideally, one would use long int but does
fortran portably support long int?) The threshold N value may depend
on surface type (above is for surface=="interpolate").

The following sample code does not result in segfault but when run
with valgrind, it produces the warning about large range. (In the
code that segfaults N is about 77,000).

set.seed(1)
n = 5000      # n=4000 seems ok
x = rnorm(n)
y = x + rnorm(n)
yf = loess(y~x, span=0.75, control=loess.control(trace.hat="approximate"))
print( predict(yf, data.frame(x=1), se=TRUE) )

##---valgrid output with segfault (abridged):

> test4()
==30841== Warning: set address range perms: large range [0x3962a040,
0x5fb42608) (defined)
==30841== Warning: set address range perms: large range [0x5fb43040,
0xf8c8e130) (defined)
==30841== Invalid write of size 4
==30841==    at 0xCD719F0: ehg139_ (loessf.f:1444)
==30841==    by 0xCD72E0C: ehg131_ (loessf.f:467)
==30841==    by 0xCD73A5A: lowesb_ (loessf.f:1530)
==30841==    by 0xCD2C774: loess_ise (loessc.c:219)
==30841==    by 0x486C7F: do_dotCode (dotcode.c:1744)
==30841==    by 0x4AB040: bcEval (eval.c:4544)
==30841==    by 0x4B6B3F: Rf_eval (eval.c:498)
==30841==    by 0x4BAD87: Rf_applyClosure (eval.c:960)
==30841==    by 0x4B6D5E: Rf_eval (eval.c:611)
==30841==    by 0x4B7A1E: do_eval (eval.c:2193)
==30841==    by 0x4AB040: bcEval (eval.c:4544)
==30841==    by 0x4B6B3F: Rf_eval (eval.c:498)
==30841==  Address 0xf8cd4144 is not stack'd, malloc'd or (recently)
free'd
==30841==

 *** caught segfault ***
address 0xf8cd4144, cause 'memory not mapped'

Traceback:
 1: predLoess(y, x, newx, s, weights, pars$robust, pars$span,
pars$degree,     pars$normalize, pars$parametric, pars$drop.square,
pars$surface,     pars$cell, pars$family, kd, divisor, se = se)
 2: eval(expr, envir, enclos)
 3: eval(substitute(expr), data, enclos = parent.frame())
 4: with.default(object, predLoess(y, x, newx, s, weights,
pars$robust,     pars$span, pars$degree, pars$normalize,
pars$parametric,     pars$drop.square, pars$surface, pars$cell,
pars$family, kd,     divisor, se = se))
 5: with(object, predLoess(y, x, newx, s, weights, pars$robust,
pars$span,     pars$degree, pars$normalize, pars$parametric,
pars$drop.square,     pars$surface, pars$cell, pars$family, kd,
divisor, se = se))
 6: predict.loess(y2, data.frame(hours = xmin), se = TRUE)
 7: predict(y2, data.frame(hours = xmin), se = TRUE)
 8: test4()
aborting ...
==30841==


-- 
+---
| Hiroyuki Kawakatsu
| Business School, Dublin City University
| Dublin 9, Ireland. Tel +353 (0)1 700 7496



More information about the R-devel mailing list