[R-sig-Geo] Impute missing values along a spatial network

Tobias Ruttenauer tob|@@@rutten@uer @end|ng |rom nu|||e|d@ox@@c@uk
Fri Mar 26 21:49:54 CET 2021


Thanks again for the additional comments, Roger and Thierry!

I've tried a couple of things in the meantime. An empty Besag or intrinsic CAR model shrinks extreme values very strongly towards the overall mean (one would somehow need to increase the spatial influence). What works quite well for those streets with at least one valid count are Besag or intrinsic CAR models with road number dummies (for the higher unit of road segments) as covariates. This seems to shrink towards the road mean instead of the overall mean. On the negative side, it can lead to very rough transitions where two streets with different numbers transition into each other, and it gives me a prediction of zero for all segments of road numbers without any valid count (but it might be reasonable to exclude those anyway). Using the road type (e.g. highway, minor road) as dummies is a compromise between the empty and the road number model. 

Results are btw very similar for a Gaussian on the log transformed count and a Poisson, at least in terms of covariance (correlation of 0.99 between the mean posterior predictions of the two models).

One thing I was wondering: is there a way of changing the weights or the normalization method for the adjacency matrix in INLA models if I need to store it on disk via nb2INLA()? For instance, if one road is connected to many others at a given crossing, it might be reasonable to assume something like an additive spatial influence instead of an average right? Or would that be something that can better be solved with a spatial interaction or gravity model?

Thanks a lot again for the great help and best wishes
Tobias


-----Original Message-----
From: Roger Bivand <Roger.Bivand using nhh.no> 
Sent: 24 March 2021 17:01
To: Tobias Ruttenauer <tobias.ruttenauer using nuffield.ox.ac.uk>
Cc: r-sig-geo using r-project.org
Subject: RE: [R-sig-Geo] Impute missing values along a spatial network

On Wed, 24 Mar 2021, Tobias Ruttenauer wrote:

> Thank you very much for the hint, Roger! Completely right, a Gaussian 
> process actually does not make too much sense in this case. I'll have 
> a look into INLA and see if I can work with that.

If you look at this like a rate shrinkage problem, maybe you can set up a spatial interaction model based on data at the end nodes of the segments to generate "expected" volumes? These would then not be (necessarily) time constant covariates at the nodes, but could reflect supply/demand factors. 
Otherwise you just have the volume counts. A basic intrinsic CAR might be helpful, but the data would need to guide the signal/noise ratio - without expectations, and with many missing counts, the predictions for segments with no counts will have broad posterior distributions. Avoiding Gaussian may help avoid predicting negative volumes - but log volumes might be OK.

Roger

>
> For now, I don't have any covariates in this. This is mainly because 
> I'm specifically interested in the annual variation of traffic counts, 
> but available covariates are all time-constant.
>
> Thanks again and best wishes
> Tobias
>
>
> -----Original Message-----
> From: Roger Bivand <Roger.Bivand using nhh.no>
> Sent: 24 March 2021 14:18
> To: Tobias Ruttenauer <tobias.ruttenauer using nuffield.ox.ac.uk>
> Cc: r-sig-geo using r-project.org
> Subject: Re: [R-sig-Geo] Impute missing values along a spatial network
>
> On Wed, 24 Mar 2021, Tobias Ruttenauer wrote:
>
>> Dear list members,
>>
>> I am trying to construct a road network with traffic estimates for 
>> each road segment. I have count data of the traffic for a subset of 
>> the segments and I have the road network as spatial lines data. For 
>> those segments without count data, I would like to perform something 
>> like linear imputation or some sort of interpolation / kriging along 
>> the road network instead of using pure geographical distance. For 
>> instance, if I have 7 road segments A-B-C-D-E and F-G (F and G are 
>> unconnected to the rest), and I have data for A and D, how can I 
>> impute data for B, C (and
>> E) by only using A and D, while ignoring F and G even though they 
>> might be geographically close?
>
> Are there any relevant covariates associated with the road segments? I 
> think that this is more of a Markov than a Gaussian random field, so a 
> Poisson spatial regression with a neighbour matrix representing 
> contiguous segments might be possible. Covariates, or an offset by an 
> expected volume might help. INLA with a Besag model - INLA fits 
> missing responses, or
> mgcv::gam() with an "mrf" smooth or hglm() then predict?
>
> Any other suggestions?
>
> Roger
>
>>
>> This seems fairly intuitive to me but I couldn't find a package doing 
>> that. stplanr would do something related but it seems it needs 
>> origin-destination data (which I don't have). I'd be grateful if 
>> someone could nudge me into the right direction. I guess I'm using 
>> the wrong terminology.
>>
>> Thanks a lot and best wishes
>> Tobias
>>
>> Tobias Rüttenauer
>> Nuffield College
>> University of Oxford
>> Oxford, OX1 1NF
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics, Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
> e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics, Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list