[R-sig-Geo] error message when running errorsarlm

Roger Bivand Roger.Bivand at nhh.no
Thu May 22 20:08:10 CEST 2008

On Thu, 22 May 2008, evans324 at umn.edu wrote:

> On May 22 2008, Roger Bivand wrote:
>> > >  Does that mean that you get a sensible lambda for your model now - the 
>> > >  line search leads somewhere other than a boundary of the interval?
>> > 
>> >  I apologize for being unclear. I actually upgraded R and updated 
>> >  packages, then ran errorsarlm with method="Matrix" and got the same 
>> >  error messages I'd had previously (i.e., the search led to the boundary 
>> >  of the interval). I then tried your other suggestion and used 
>> >  method="spam" and got a result with no error messages.
>> But we do not know why the two are not the same (they should be), so I 
>> would still not trust the outcome. I would be interested in off-list access 
>> to the data being used - I think that there is some issue with the scaling 
>> of the variable values. Do you see the same difference using spautolm(), 
>> which is effectively the same as errorsarlm(), but with a different 
>> internal structure?
> I do see the same difference using spautolm() and get no error messages using 
> it. I'll send you then data separately and would appreciate your opinion on 
> them.


OK, thanks. On first inspection, the choice of a distance criterion for 
neighbours seems to be part of the problem. Using:

nb_k5 <- knn2nb(knearneigh(coordinates(rd), k=5))
nb_k5s <- make.sym.nb(nb_k5)

where rd is the SpatialPointsDataFrame object, with many fewer neighbours 
than your 2500m or 3000m criteria, gives results from "Matrix" and "spam" 
that are identical, and most likely what you are after. These weights are 
the 5 nearest neighbours coerced to symmetric, so all ahave 5 neighbours 
and the largest number of neighbours is 12 (your 2500m criterion had a 
mean number of neighbours of 280, maximum 804). If you can live without 
your choice of neighbours (which in some settings may be getting pretty 
close to your market segment dummies), I'd advise using something much 
sparser (but symmetric). The sparser weights matrices also increase the 
speed dramatically.

If you look at the bottom of ?bptest.sarlm, you'll see a cheap and totally 
untested way of adjusting the output SEs, but please don't believe what it 
does, because it is treating the lambda value as known, not estimated. A 
guess at the remaining heterogeneity would be age by maintenance 
interaction, older houses will vary in value by maintenance, probably also 
by neighbourhood?

Hope this helps,

>> > >  There are different traditions. Econometricians and some others in 
>> > >  social science try to trick the standard errors by "magic", while 
>> > >  epidemiologists (and crime people) typically use case weights - that 
>> > >  is model the heteroscedasticity directly. spautolm() can include such 
>> > >  case weights. I don't think that there is any substantive and reliable 
>> > >  theory for adjusting the SE, that is theory that doesn't appeal to 
>> > >  assumptions we already know don't hold. Sampling from the posterior 
>> > >  gives a handle on this, but is not simple, and doesn't really suit 10K 
>> > >  observations.
>> > > 
>> >  Can you explain "magic" a little further? I'm running this for a 
>> >  professor who is a bit nervous about black box techniques and I'd like 
>> >  to be able to offer him a good explanation. I think he'll just have me 
>> >  calculate White's standard errors and ignore spatial autocorrelation if 
>> >  I can't be clearer.
>> > 
>> If this is all your "professor" can manage, please replace/educate! The 
>> model is fundamentally misspecified, and neither "magicing" the standard 
>> errors, nor just fitting a simultaneous autoregressive error model will let 
>> you make fair decisions on the "significance" or otherwise of the 
>> right-hand side variables, which I suppose is the object of the exercise?
> I agree here, but haven't been able to get much advice on this. I appreciate 
> your input.
>> (Looking at Johnston & DiNardo (1997), pp. 164-166, it looks as if White's 
>> SE only help asymptotically (in Prof. Ripley's well-known remark, 
>> asymptotics are a foreign country with spatial data), and not in finite 
>> samples, and their performance is unknown if the residuals are 
>> autocorrelated, which is the case here).
>> The vast number of observations is no help either, because they certainly 
>> introduce heterogeneity that has not been controlled for. Is this a grid of 
>> global species occurrence data, by any chance? Which RHS variables are 
>> covering for differences in environmental drivers? Or is there a better 
>> reason for using many observations (instead of careful data collection) 
>> than just their being available?
> This is a hedonic regression with a goal of eliciting economic values for 
> different percentages of tree cover on parcels and in the local neighborhood 
> as capitalized in home sale prices. We're using all 2005 residential sales 
> from Ramsey and Dakota counties in Minnesota, USA as our observations. This 
> gives us sales from most study area regions and for all months. I'll send you 
> a description of the RHS variables with the dataset.
>> More observations do not mean more information if meaningful differences 
>> across the observations are not captured by included variables (with the 
>> correct functional form). Have you tried GAM with flexible functional forms 
>> on the RHS variables and s(x,y) on the (point) locations of the 
>> observations?
> I haven't tried this, but will look into it. 
>> You are not alone in your plight, but if the inferences matter, then it's 
>> better to be cautious, irrespective of the "professor".
> Thanks very much for your help.
> Regards,
> Heather
> --- Heather Sander
> Ph.D. Candidate:  Conservation Biology
> Office:  305 Ecology & 420 Blegen
> Mail:  University of Minnesota
> Dept. of Geography
> 414 Social Science Bldg.
> 267 19th Ave. S.
> Minneapolis, MN 55455

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

More information about the R-sig-Geo mailing list