[R-sig-Geo] error message when running errorsarlm

Roger Bivand Roger.Bivand at nhh.no
Fri May 23 09:01:38 CEST 2008


On Fri, 23 May 2008, Roger Bivand wrote:

> On Thu, 22 May 2008, Roger Bivand wrote:
>
>>  On Thu, 22 May 2008, evans324 at umn.edu wrote:
>> 
>> >   On May 22 2008, Roger Bivand wrote:
>> > 
>> > > > >    Does that mean that you get a sensible lambda for your model now 
>> > > > >    - the line search leads somewhere other than a boundary of the 
>> > > > >    interval?
>> > > > 
>> > > >    I apologize for being unclear. I actually upgraded R and updated 
>> > > >    packages, then ran errorsarlm with method="Matrix" and got the 
>> > > >    same error messages I'd had previously (i.e., the search led to 
>> > > >    the boundary of the interval). I then tried your other suggestion 
>> > > >    and used method="spam" and got a result with no error messages.
>> > > 
>> > >   But we do not know why the two are not the same (they should be), so 
>> > >   I would still not trust the outcome. I would be interested in 
>> > >   off-list access to the data being used - I think that there is some 
>> > >   issue with the scaling of the variable values. Do you see the same 
>> > >   difference using spautolm(), which is effectively the same as 
>> > >   errorsarlm(), but with a different internal structure?
>> > 
>> >   I do see the same difference using spautolm() and get no error messages
>> >   using it. I'll send you then data separately and would appreciate your
>> >   opinion on them.
>
> To close the part of the thread about differences between the spam and Matrix 
> methods, I can report that on Linux (both 2GB and 1GB 32-bit), there is no 
> difference for the original model using the 2500m distance criterion, and 
> both line searches reach a lambda of 0.9165158. The same applies to Windows 
> 32-bit. R version 2.7.0 using R-generic BLAS, spdep 0.4-21, Matrix 
> 0.999375-9, spam 0.13-3, in all cases.
>
> The sparser weights cases described below run with adequate speed, but the 
> semi-dense weights (average # neighbours 280) run more slowly, but the time 
> is mostly spent in making sure that the weights are exactly symmetric - now 
> in R functions similar.listw() and listw2U(), both of which will be 
> re-written to hand out the time consuming parts to compiled C.
>
> The data were 10055 house prices, the objective to fit a hedonic regression. 
> Conclusion: using a more sparse neighbour representation is advisable; the 
> computational problems could not be reproduced, but looked initially like a 
> package version issue - Matrix is moving fast, and spdep is trying to keep up 
> with it.

Sorry, to put the record straight, finally, I hope - I can provoke similar 
error messages under Windows when either spam or Matrix methods are unable 
to allocate memory - usually the error messages returned from inside spam 
and Matrix are not informative, as the ones initially seen (things like 
"no function to return from, jumping to top level"). I just managed to 
reproduce this with spam on the 1GB Windows machine.

This is happening when the sparse package functions bail out because of 
difficulties allocating memory, but instead of terminating the line 
search, it continues to batter its head on one end of the interval. I'll 
try to make the functions behave more sensitively - which is not easy, 
because the sparse package functions can fail for a number of reasons, 
like lambda leaving its range, that are acceptable, or running out of 
memory on Windows, which isn't. On Linux 1GB, the underlying better memory 
management systems seem to protect us from this infelicity.

The base conclusion remain the same - weights should really be sparse, so 
do look at the output of the print method for neighbour objects! Neither 
method has any trouble with really sparse weights for this size of data 
set.

Roger

>
> Roger
>
>>
>>  Heather:
>>
>>  OK, thanks. On first inspection, the choice of a distance criterion for
>>  neighbours seems to be part of the problem. Using:
>>
>>  nb_k5 <- knn2nb(knearneigh(coordinates(rd), k=5))
>>  nb_k5s <- make.sym.nb(nb_k5)
>>
>>  where rd is the SpatialPointsDataFrame object, with many fewer neighbours
>>  than your 2500m or 3000m criteria, gives results from "Matrix" and "spam"
>>  that are identical, and most likely what you are after. These weights are
>>  the 5 nearest neighbours coerced to symmetric, so all ahave 5 neighbours
>>  and the largest number of neighbours is 12 (your 2500m criterion had a
>>  mean number of neighbours of 280, maximum 804). If you can live without
>>  your choice of neighbours (which in some settings may be getting pretty
>>  close to your market segment dummies), I'd advise using something much
>>  sparser (but symmetric). The sparser weights matrices also increase the
>>  speed dramatically.
>>
>>  If you look at the bottom of ?bptest.sarlm, you'll see a cheap and totally
>>  untested way of adjusting the output SEs, but please don't believe what it
>>  does, because it is treating the lambda value as known, not estimated. A
>>  guess at the remaining heterogeneity would be age by maintenance
>>  interaction, older houses will vary in value by maintenance, probably also
>>  by neighbourhood?
>>
>>  Hope this helps,
>>
>>  Roger
>> > 
>> > > > >    There are different traditions. Econometricians and some others 
>> > > > >    in social science try to trick the standard errors by "magic", 
>> > > > >    while epidemiologists (and crime people) typically use case 
>> > > > >    weights - that is model the heteroscedasticity directly. 
>> > > > >    spautolm() can include such case weights. I don't think that 
>> > > > >    there is any substantive and reliable theory for adjusting the 
>> > > > >    SE, that is theory that doesn't appeal to assumptions we already 
>> > > > >    know don't hold. Sampling from the posterior gives a handle on 
>> > > > >    this, but is not simple, and doesn't really suit 10K 
>> > > > >    observations.
>> > > > > 
>> > > >    Can you explain "magic" a little further? I'm running this for a 
>> > > >    professor who is a bit nervous about black box techniques and I'd 
>> > > >    like to be able to offer him a good explanation. I think he'll 
>> > > >    just have me calculate White's standard errors and ignore spatial 
>> > > >    autocorrelation if I can't be clearer.
>> > > > 
>> > > 
>> > >   If this is all your "professor" can manage, please replace/educate! 
>> > >   The model is fundamentally misspecified, and neither "magicing" the 
>> > >   standard errors, nor just fitting a simultaneous autoregressive error 
>> > >   model will let you make fair decisions on the "significance" or 
>> > >   otherwise of the right-hand side variables, which I suppose is the 
>> > >   object of the exercise?
>> > > 
>> >   I agree here, but haven't been able to get much advice on this. I
>> >   appreciate your input.
>> > 
>> > >   (Looking at Johnston & DiNardo (1997), pp. 164-166, it looks as if 
>> > >   White's SE only help asymptotically (in Prof. Ripley's well-known 
>> > >   remark, asymptotics are a foreign country with spatial data), and not 
>> > >   in finite samples, and their performance is unknown if the residuals 
>> > >   are autocorrelated, which is the case here).
>> > 
>> > >   The vast number of observations is no help either, because they 
>> > >   certainly introduce heterogeneity that has not been controlled for. 
>> > >   Is this a grid of global species occurrence data, by any chance? 
>> > >   Which RHS variables are covering for differences in environmental 
>> > >   drivers? Or is there a better reason for using many observations 
>> > >   (instead of careful data collection) than just their being available?
>> > > 
>> >   This is a hedonic regression with a goal of eliciting economic values 
>> >   for
>> >   different percentages of tree cover on parcels and in the local
>> >   neighborhood as capitalized in home sale prices. We're using all 2005
>> >   residential sales from Ramsey and Dakota counties in Minnesota, USA as 
>> >   our
>> >   observations. This gives us sales from most study area regions and for 
>> >   all
>> >   months. I'll send you a description of the RHS variables with the 
>> >   dataset.
>> > 
>> > >   More observations do not mean more information if meaningful 
>> > >   differences across the observations are not captured by included 
>> > >   variables (with the correct functional form). Have you tried GAM with 
>> > >   flexible functional forms on the RHS variables and s(x,y) on the 
>> > >   (point) locations of the observations?
>> > 
>> >   I haven't tried this, but will look into it. 
>> > >   You are not alone in your plight, but if the inferences matter, then 
>> > >   it's better to be cautious, irrespective of the "professor".
>> > > 
>> >   Thanks very much for your help.
>> > 
>> >   Regards,
>> >   Heather
>> > 
>> >   --- Heather Sander
>> >   Ph.D. Candidate:  Conservation Biology
>> >   Office:  305 Ecology & 420 Blegen
>> >   Mail:  University of Minnesota
>> >   Dept. of Geography
>> >   414 Social Science Bldg.
>> >   267 19th Ave. S.
>> >   Minneapolis, MN 55455
>> >   USA
>> > 
>> > 
>> > 
>> 
>> 
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list