[R-sig-Geo] Error in predict.sarlm: non-unique row.names given

Mon Jul 8 05:32:01 CEST 2019

Do provide a complete reproducible example. I really appeal to all posting 
questions to give potential helpers something to work on. Asking for 
reproducible examples is the absolutely dominant response to postings that 
lack them, if they get any response at all.

Start with this and work backwards until you can reproduce your 
misunderstanding:

col <- st_read(system.file("shapes/columbus.shp", package="spData"))
train <- col[col$EW == 1,]
test <- col[col$EW == 0,]
col.nb <- spdep::poly2nb(col)
train.nb <- spdep::poly2nb(train)
test.nb <- spdep::poly2nb(test)
attr(col.nb, "region.id")
attr(train.nb, "region.id")
attr(test.nb, "region.id")
train.mod <- lagsarlm(CRIME ~ INC + HOVAL, data=train,
   listw=spdep::nb2listw(train.nb))
try(preds <- predict(train.mod, newdata=test,
   listw=spdep::nb2listw(test.nb)))
preds[2]
try(preds1 <- predict(train.mod, newdata=col,
   listw=spdep::nb2listw(col.nb)))
# warning

preds1[4]
try(preds2 <- predict(train.mod, newdata=test,
   listw=spdep::nb2listw(col.nb)))
preds2[2]

Using the complete set of weights permits the spatial process to flow 
between neighbouring members of train/test sets.

Your problem is probably that your two data objects do not use row.names 
as expected:

attr(test.nb, "region.id") <- as.character(1:length(test.nb))
attr(train.nb, "region.id") <- as.character(1:length(train.nb))
train.mod1 <- lagsarlm(CRIME ~ INC + HOVAL, data=train,
   listw=spdep::nb2listw(train.nb))
try(preds3 <- predict(train.mod, newdata=test,
   listw=spdep::nb2listw(test.nb)))
# Error in predict.sarlm(train.mod, newdata = test, listw = 
# spdep::nb2listw(test.nb)) :
#   mismatch between newdata and spatial weights. newdata should have 
# region.id as row.names

as is obvious. So when the predict method is trying to assign the newdata 
neighbours (it needs to identify the correct rows in newdata based on the 
"region.id" attribute of the provided weights), it fails as described.

Use the whole data weights when predicting for the test set newdata=, or 
if the two graphs do not neighbour each other, that is train.nb is 
separate from test.nb (think two islands), make sure that the region.ids 
and row.names do not overlap between test and train sets.

Please use the example to explore the problem in your workflow, (re-)read 
Goulard et al. (2017), and the help page, and report back. Remember that 
you can only predict for a test set of reasonable size (because as you see 
from the underlying article, you probably need an inverted nxn matrix in 
the spatial lag model case).

Hope this clarifies

Roger

On Mon, 8 Jul 2019, Jiawen Ng wrote:

> Another question on predict.sarlm!
>
> Here is the line of code that is producing the error:
> pred <- spatialreg::predict.sarlm(model, df, test.listw,zero.policy = T)
>
> Here is the error:
>
> Error in mat2listw(W, row.names = region.id.mixed, style = style) :
>  non-unique row.names given
> In addition: Warning messages:
> 1: In spatialreg::predict.sarlm(model, df, test.listw,  :
>  some region.id are both in data and newdata
> 2: In subset(attr(listw.mixed, "region.id"), attr(listw.mixed, "region.id")
> %in%  :
>  longer object length is not a multiple of shorter object length
>
> Any idea how I can solve the non-unique row.names error?
>
> Thank you!
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en