[R] building a spatial matrix

Fri May 13 18:45:08 CEST 2016

Sorry, you're right.

The result line should be:

result.m[cbind(factor(result$fcell), factor(result$cellneigh))]  <-
result$distance

idcell <- data.frame(
  id = seq_len(5),
  fcell = sample(1:100, 5))

censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
censDist$distance <- runif(nrow(censDist))

# assemble the non-symmetric distance matrix
result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
idcell$fcell)
result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
result.m[cbind(factor(result$fcell), factor(result$cellneigh))]  <-
result$distance

It's just about instantaneous on the dataset you sent me:

system.time({
result <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
id_cell$f_cell)
result.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
result$distance
})

  user  system elapsed
  0.361   0.007   0.368

Sarah

On Fri, May 13, 2016 at 10:36 AM, A M Lavezzi <mario.lavezzi at unipa.it>
wrote:
> PLEASE IGNORE THE PREVIOUS EMAIL, IT WAS SENT BY MISTAKE
>
> Hello Sarah
> thanks a lot for your advice.
>
> I followed your suggestions unitil the creation of "result"
>
> The allocation of the values of result$distance to the matrix result.m,
> however ,does not seem to work: it produces a matrix with identical
columns
> corresponding to the last values of result$distance. Maybe my description
of
> the dataset was not clear enough.
>
> I produced the final matrix spat_dist with a loop, that I report below (it
> takes about 1 hour on my macbook pro),
>
> set_i = -1   # create a variable to store the i values already examined
>
> for(i in unique(result$id)){
>
>   set_i=c(set_i,i) # store the value of the i
>
>   set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i]
#
> identify the locations connected to i. If the distance between i and j was
> examined before, don't look for the distance between j and i
>
>   for(j in set_neigh){
>     if(i!=j){
>       spat_dist[i,j] = result$distance[result$id==i &  result$id_neigh==j]
>       spat_dist[j,i] = spat_dist[i,j]
>     }
>     else{
>       spat_dist[i,j]=0
>     }
>   }
> }
>
> It is not the most elegant and efficient solution in the world, that's for
> sure.
>
> I would be grateful, if you could suggest an alternative instruction to:
>
> result.m[factor(result$fcell), factor(result$cellneigh)] <-
result$distance
>
> so I will learn a faster procedure (I tried many times but to modify this
> structure but I did not make it). I don't want to abuse of your time, so
> forget it if you are busy
>
> Thank you so much anyway,
> Mario
>
> ps I attach the data. Notice that the 1327 units in id_cell are firms,
> indexed by id, located in location f_cell. Different firms can be located
in
> the same f_cell. With respect to your suggestion, I added two columns to
> "result" with the id of the firms.
>
> On Fri, May 13, 2016 at 3:26 PM, A M Lavezzi <mario.lavezzi at unipa.it>
wrote:
>>
>>
>> Hello Sarah
>> thanks a lot for your advice.
>>
>> I followed your suggestions unitl the creation of "result"
>>
>> The allocation of the values of result$distance to the matrix result.m,
>> however ,does not seem to work: it produces a matrix with identical
columns
>> corresponding to the last values of result$distance. Maybe my
description of
>> the dataset was not clear enough.
>>
>> I produced the final matrix with a loop, that I report below (it takes
>> about 1 hour on my macbook pro),
>>
>> set_i = -1   # create a variable to store the i values already examined
>>
>> for(i in unique(result$id)){
>>
>>   set_i=c(set_i,i) # store the value of the i
>>
>>   set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i]
>> # identify the locations connected to i. Exclude                  those
>>
>>   for(j in set_neigh){
>>     if(i!=j){
>>       spat_dist[i,j] = result$distance[result$id==i &
 result$id_neigh==j]
>>       spat_dist[j,i] = spat_dist[i,j]
>>     }
>>     else{
>>       spat_dist[i,j]=0
>>     }
>>   }
>> }
>>
>> It not the most elegant and efficient solution in the world, that's for
>> sure
>>
>>
>>
>> On Thu, May 12, 2016 at 2:51 PM, Sarah Goslee <sarah.goslee at gmail.com>
>> wrote:
>>>
>>> I don't see any reason why a loop is out of the question, and
>>> answering would have been much easier if you'd included the requested
>>> reproducible data, but what about this?
>>>
>>> This solution is robust to pairs from idcell being absent in censDist,
>>> and to the difference from A to B being different than the distance
>>> from B to A, but not to A-B appearing twice. If that's possible,
>>> you'll need to figure out how to manage it.
>>>
>>> # create some fake data
>>>
>>> idcell <- data.frame(
>>>   id = seq_len(5),
>>>   fcell = sample(1:100, 5))
>>>
>>> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
>>> censDist$distance <- runif(nrow(censDist))
>>>
>>> # assemble the non-symmetric distance matrix
>>> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
>>> idcell$fcell)
>>> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
>>> result.m[factor(result$fcell), factor(result$cellneigh)] <-
>>> result$distance
>>>
>>> Sarah
>>>
>>> On Thu, May 12, 2016 at 5:26 AM, A M Lavezzi <mario.lavezzi at unipa.it>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a sample of 1327  locations, each one idetified by an id and a
>>> > numerical code.
>>> >
>>> > I need to build a spatial matrix, say, M, i.e. a 1327x1327 matrix
>>> > collecting distances among the locations.
>>> >
>>> > M(i,i) should be 0, M(i,j) should contain the distance among location
i
>>> > and
>>> > j
>>> >
>>> > I shoud use data organized in the following way:
>>> >
>>> > 1) id_cell contains the identifier (id) of each location (1...1327)
and
>>> > the
>>> > numerical code of the location (f_cell) (see head of id_cell below)
>>> >
>>> >> head(id_cell)
>>> >      id  f_cell
>>> > 1    1   2120
>>> > 12  2     204
>>> > 22  3   2546
>>> > 24  4   1327
>>> > 34  5   1729
>>> > 43  6   2293
>>> >
>>> > 2) censDist contains, for each location identified by its numerical
>>> > code,
>>> > the distance to other locations (censDist has 1.5 million rows). The
>>> > head(consist) below, for example, reads like this:
>>> >
>>> > location 2924 has a distance to 2732 of 1309.7525
>>> > location 2924 has a distance to 2875 of 696.2891,
>>> > etc.
>>> >
>>> >> head(censDist)
>>> >   f_cell f  _cell_neigh  distance
>>> > 1   2924         2732   1309.7525
>>> > 2   2924         2875     696.2891
>>> > 3   2924         2351   1346.0561
>>> > 4   2924         2350   1296.9804
>>> > 5   2924         2725   1278.1877
>>> > 6   2924         2721   1346.9126
>>> >
>>> >
>>> > Basically, for every location in  id_cell I should pick up the
distance
>>> > to
>>> > other locations in id_cell from censDist, and allocate it in M
>>> >
>>> > I have not come up with a satisfactory vectorizion of this problem and
>>> > using a loop is out of question.
>>> >
>>> > Thanks for your help
>>> > Mario
>>> >
>>> >
>>

	[[alternative HTML version deleted]]