[R] building a spatial matrix

Mon May 16 18:53:56 CEST 2016

Hi Sarah,
thanks a lot. In this line:

result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
result$distance

I had a problem with cbind(factor .. : the assignement to the [i,j] element
of the matrix did not work.

I solved in this manner:

- I modified censDist, so for every couple of location codes (f_cell,
f_cell_neigh), I added the corresponding id from id_cell (they are directly
interpretable as the coordinates of the 1327x1327 result.m matrix). So
censDist looks like this:

head(censDist)
   f_cell f_cell_neigh  distance id id_neigh
1   2924         2732 1309.7525 NA       NA
2   2924         2875  696.2891 NA       NA
3   2924         2351 1346.0561 NA      975
4   2924         2350 1296.9804 NA      758
5   2924         2725 1278.1877 NA       NA
6   2924         2721 1346.9126 NA       NA

Then I run your code with a slight modification of the last row:

resultS <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
id_cell$f_cell)
resultS.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
resultS.m[cbind(resultS$id, resultS$id_neigh)] <- resultS$distance

and it worked!

thank you so much,
Mario

On Fri, May 13, 2016 at 5:45 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:

> Sorry, you're right.
>
> The result line should be:
>
> result.m[cbind(factor(result$fcell), factor(result$cellneigh))]  <-
> result$distance
>
>
> idcell <- data.frame(
>   id = seq_len(5),
>   fcell = sample(1:100, 5))
>
> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
> censDist$distance <- runif(nrow(censDist))
>
> # assemble the non-symmetric distance matrix
> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
> idcell$fcell)
> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
> result.m[cbind(factor(result$fcell), factor(result$cellneigh))]  <-
> result$distance
>
> It's just about instantaneous on the dataset you sent me:
>
>
> system.time({
> result <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
> id_cell$f_cell)
> result.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
> result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
> result$distance
> })
>
>   user  system elapsed
>   0.361   0.007   0.368
>
>
>
>
> Sarah
>
>
> On Fri, May 13, 2016 at 10:36 AM, A M Lavezzi <mario.lavezzi at unipa.it>
> wrote:
> > PLEASE IGNORE THE PREVIOUS EMAIL, IT WAS SENT BY MISTAKE
> >
> > Hello Sarah
> > thanks a lot for your advice.
> >
> > I followed your suggestions unitil the creation of "result"
> >
> > The allocation of the values of result$distance to the matrix result.m,
> > however ,does not seem to work: it produces a matrix with identical
> columns
> > corresponding to the last values of result$distance. Maybe my
> description of
> > the dataset was not clear enough.
> >
> > I produced the final matrix spat_dist with a loop, that I report below
> (it
> > takes about 1 hour on my macbook pro),
> >
> > set_i = -1   # create a variable to store the i values already examined
> >
> > for(i in unique(result$id)){
> >
> >   set_i=c(set_i,i) # store the value of the i
> >
> >   set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in%
> set_i] #
> > identify the locations connected to i. If the distance between i and j
> was
> > examined before, don't look for the distance between j and i
> >
> >   for(j in set_neigh){
> >     if(i!=j){
> >       spat_dist[i,j] = result$distance[result$id==i &
>  result$id_neigh==j]
> >       spat_dist[j,i] = spat_dist[i,j]
> >     }
> >     else{
> >       spat_dist[i,j]=0
> >     }
> >   }
> > }
> >
> > It is not the most elegant and efficient solution in the world, that's
> for
> > sure.
> >
> > I would be grateful, if you could suggest an alternative instruction to:
> >
> > result.m[factor(result$fcell), factor(result$cellneigh)] <-
> result$distance
> >
> > so I will learn a faster procedure (I tried many times but to modify this
> > structure but I did not make it). I don't want to abuse of your time, so
> > forget it if you are busy
> >
> > Thank you so much anyway,
> > Mario
> >
> > ps I attach the data. Notice that the 1327 units in id_cell are firms,
> > indexed by id, located in location f_cell. Different firms can be
> located in
> > the same f_cell. With respect to your suggestion, I added two columns to
> > "result" with the id of the firms.
> >
> > On Fri, May 13, 2016 at 3:26 PM, A M Lavezzi <mario.lavezzi at unipa.it>
> wrote:
> >>
> >>
> >> Hello Sarah
> >> thanks a lot for your advice.
> >>
> >> I followed your suggestions unitl the creation of "result"
> >>
> >> The allocation of the values of result$distance to the matrix result.m,
> >> however ,does not seem to work: it produces a matrix with identical
> columns
> >> corresponding to the last values of result$distance. Maybe my
> description of
> >> the dataset was not clear enough.
> >>
> >> I produced the final matrix with a loop, that I report below (it takes
> >> about 1 hour on my macbook pro),
> >>
> >> set_i = -1   # create a variable to store the i values already examined
> >>
> >> for(i in unique(result$id)){
> >>
> >>   set_i=c(set_i,i) # store the value of the i
> >>
> >>   set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in%
> set_i]
> >> # identify the locations connected to i. Exclude                  those
> >>
> >>   for(j in set_neigh){
> >>     if(i!=j){
> >>       spat_dist[i,j] = result$distance[result$id==i &
>  result$id_neigh==j]
> >>       spat_dist[j,i] = spat_dist[i,j]
> >>     }
> >>     else{
> >>       spat_dist[i,j]=0
> >>     }
> >>   }
> >> }
> >>
> >> It not the most elegant and efficient solution in the world, that's for
> >> sure
> >>
> >>
> >>
> >> On Thu, May 12, 2016 at 2:51 PM, Sarah Goslee <sarah.goslee at gmail.com>
> >> wrote:
> >>>
> >>> I don't see any reason why a loop is out of the question, and
> >>> answering would have been much easier if you'd included the requested
> >>> reproducible data, but what about this?
> >>>
> >>> This solution is robust to pairs from idcell being absent in censDist,
> >>> and to the difference from A to B being different than the distance
> >>> from B to A, but not to A-B appearing twice. If that's possible,
> >>> you'll need to figure out how to manage it.
> >>>
> >>> # create some fake data
> >>>
> >>> idcell <- data.frame(
> >>>   id = seq_len(5),
> >>>   fcell = sample(1:100, 5))
> >>>
> >>> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
> >>> censDist$distance <- runif(nrow(censDist))
> >>>
> >>> # assemble the non-symmetric distance matrix
> >>> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
> >>> idcell$fcell)
> >>> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
> >>> result.m[factor(result$fcell), factor(result$cellneigh)] <-
> >>> result$distance
> >>>
> >>> Sarah
> >>>
> >>> On Thu, May 12, 2016 at 5:26 AM, A M Lavezzi <mario.lavezzi at unipa.it>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I have a sample of 1327  locations, each one idetified by an id and a
> >>> > numerical code.
> >>> >
> >>> > I need to build a spatial matrix, say, M, i.e. a 1327x1327 matrix
> >>> > collecting distances among the locations.
> >>> >
> >>> > M(i,i) should be 0, M(i,j) should contain the distance among
> location i
> >>> > and
> >>> > j
> >>> >
> >>> > I shoud use data organized in the following way:
> >>> >
> >>> > 1) id_cell contains the identifier (id) of each location (1...1327)
> and
> >>> > the
> >>> > numerical code of the location (f_cell) (see head of id_cell below)
> >>> >
> >>> >> head(id_cell)
> >>> >      id  f_cell
> >>> > 1    1   2120
> >>> > 12  2     204
> >>> > 22  3   2546
> >>> > 24  4   1327
> >>> > 34  5   1729
> >>> > 43  6   2293
> >>> >
> >>> > 2) censDist contains, for each location identified by its numerical
> >>> > code,
> >>> > the distance to other locations (censDist has 1.5 million rows). The
> >>> > head(consist) below, for example, reads like this:
> >>> >
> >>> > location 2924 has a distance to 2732 of 1309.7525
> >>> > location 2924 has a distance to 2875 of 696.2891,
> >>> > etc.
> >>> >
> >>> >> head(censDist)
> >>> >   f_cell f  _cell_neigh  distance
> >>> > 1   2924         2732   1309.7525
> >>> > 2   2924         2875     696.2891
> >>> > 3   2924         2351   1346.0561
> >>> > 4   2924         2350   1296.9804
> >>> > 5   2924         2725   1278.1877
> >>> > 6   2924         2721   1346.9126
> >>> >
> >>> >
> >>> > Basically, for every location in  id_cell I should pick up the
> distance
> >>> > to
> >>> > other locations in id_cell from censDist, and allocate it in M
> >>> >
> >>> > I have not come up with a satisfactory vectorizion of this problem
> and
> >>> > using a loop is out of question.
> >>> >
> >>> > Thanks for your help
> >>> > Mario
> >>> >
> >>> >
> >>
>

-- 
Andrea Mario Lavezzi
DiGi,Sezione Diritto e Società
Università di Palermo
Piazza Bologni 8
90134 Palermo, Italy
tel. ++39 091 23892208
fax ++39 091 6111268
skype: lavezzimario
email: mario.lavezzi (at) unipa.it
web: http://www.unipa.it/~mario.lavezzi

	[[alternative HTML version deleted]]