[R] building a spatial matrix
Sarah Goslee
sarah.goslee at gmail.com
Fri May 13 18:45:08 CEST 2016
Sorry, you're right.
The result line should be:
result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <-
result$distance
idcell <- data.frame(
id = seq_len(5),
fcell = sample(1:100, 5))
censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
censDist$distance <- runif(nrow(censDist))
# assemble the non-symmetric distance matrix
result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
idcell$fcell)
result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <-
result$distance
It's just about instantaneous on the dataset you sent me:
system.time({
result <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
id_cell$f_cell)
result.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
result$distance
})
user system elapsed
0.361 0.007 0.368
Sarah
On Fri, May 13, 2016 at 10:36 AM, A M Lavezzi <mario.lavezzi at unipa.it>
wrote:
> PLEASE IGNORE THE PREVIOUS EMAIL, IT WAS SENT BY MISTAKE
>
> Hello Sarah
> thanks a lot for your advice.
>
> I followed your suggestions unitil the creation of "result"
>
> The allocation of the values of result$distance to the matrix result.m,
> however ,does not seem to work: it produces a matrix with identical
columns
> corresponding to the last values of result$distance. Maybe my description
of
> the dataset was not clear enough.
>
> I produced the final matrix spat_dist with a loop, that I report below (it
> takes about 1 hour on my macbook pro),
>
> set_i = -1 # create a variable to store the i values already examined
>
> for(i in unique(result$id)){
>
> set_i=c(set_i,i) # store the value of the i
>
> set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i]
#
> identify the locations connected to i. If the distance between i and j was
> examined before, don't look for the distance between j and i
>
> for(j in set_neigh){
> if(i!=j){
> spat_dist[i,j] = result$distance[result$id==i & result$id_neigh==j]
> spat_dist[j,i] = spat_dist[i,j]
> }
> else{
> spat_dist[i,j]=0
> }
> }
> }
>
> It is not the most elegant and efficient solution in the world, that's for
> sure.
>
> I would be grateful, if you could suggest an alternative instruction to:
>
> result.m[factor(result$fcell), factor(result$cellneigh)] <-
result$distance
>
> so I will learn a faster procedure (I tried many times but to modify this
> structure but I did not make it). I don't want to abuse of your time, so
> forget it if you are busy
>
> Thank you so much anyway,
> Mario
>
> ps I attach the data. Notice that the 1327 units in id_cell are firms,
> indexed by id, located in location f_cell. Different firms can be located
in
> the same f_cell. With respect to your suggestion, I added two columns to
> "result" with the id of the firms.
>
> On Fri, May 13, 2016 at 3:26 PM, A M Lavezzi <mario.lavezzi at unipa.it>
wrote:
>>
>>
>> Hello Sarah
>> thanks a lot for your advice.
>>
>> I followed your suggestions unitl the creation of "result"
>>
>> The allocation of the values of result$distance to the matrix result.m,
>> however ,does not seem to work: it produces a matrix with identical
columns
>> corresponding to the last values of result$distance. Maybe my
description of
>> the dataset was not clear enough.
>>
>> I produced the final matrix with a loop, that I report below (it takes
>> about 1 hour on my macbook pro),
>>
>> set_i = -1 # create a variable to store the i values already examined
>>
>> for(i in unique(result$id)){
>>
>> set_i=c(set_i,i) # store the value of the i
>>
>> set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i]
>> # identify the locations connected to i. Exclude those
>>
>> for(j in set_neigh){
>> if(i!=j){
>> spat_dist[i,j] = result$distance[result$id==i &
result$id_neigh==j]
>> spat_dist[j,i] = spat_dist[i,j]
>> }
>> else{
>> spat_dist[i,j]=0
>> }
>> }
>> }
>>
>> It not the most elegant and efficient solution in the world, that's for
>> sure
>>
>>
>>
>> On Thu, May 12, 2016 at 2:51 PM, Sarah Goslee <sarah.goslee at gmail.com>
>> wrote:
>>>
>>> I don't see any reason why a loop is out of the question, and
>>> answering would have been much easier if you'd included the requested
>>> reproducible data, but what about this?
>>>
>>> This solution is robust to pairs from idcell being absent in censDist,
>>> and to the difference from A to B being different than the distance
>>> from B to A, but not to A-B appearing twice. If that's possible,
>>> you'll need to figure out how to manage it.
>>>
>>> # create some fake data
>>>
>>> idcell <- data.frame(
>>> id = seq_len(5),
>>> fcell = sample(1:100, 5))
>>>
>>> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
>>> censDist$distance <- runif(nrow(censDist))
>>>
>>> # assemble the non-symmetric distance matrix
>>> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
>>> idcell$fcell)
>>> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
>>> result.m[factor(result$fcell), factor(result$cellneigh)] <-
>>> result$distance
>>>
>>> Sarah
>>>
>>> On Thu, May 12, 2016 at 5:26 AM, A M Lavezzi <mario.lavezzi at unipa.it>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a sample of 1327 locations, each one idetified by an id and a
>>> > numerical code.
>>> >
>>> > I need to build a spatial matrix, say, M, i.e. a 1327x1327 matrix
>>> > collecting distances among the locations.
>>> >
>>> > M(i,i) should be 0, M(i,j) should contain the distance among location
i
>>> > and
>>> > j
>>> >
>>> > I shoud use data organized in the following way:
>>> >
>>> > 1) id_cell contains the identifier (id) of each location (1...1327)
and
>>> > the
>>> > numerical code of the location (f_cell) (see head of id_cell below)
>>> >
>>> >> head(id_cell)
>>> > id f_cell
>>> > 1 1 2120
>>> > 12 2 204
>>> > 22 3 2546
>>> > 24 4 1327
>>> > 34 5 1729
>>> > 43 6 2293
>>> >
>>> > 2) censDist contains, for each location identified by its numerical
>>> > code,
>>> > the distance to other locations (censDist has 1.5 million rows). The
>>> > head(consist) below, for example, reads like this:
>>> >
>>> > location 2924 has a distance to 2732 of 1309.7525
>>> > location 2924 has a distance to 2875 of 696.2891,
>>> > etc.
>>> >
>>> >> head(censDist)
>>> > f_cell f _cell_neigh distance
>>> > 1 2924 2732 1309.7525
>>> > 2 2924 2875 696.2891
>>> > 3 2924 2351 1346.0561
>>> > 4 2924 2350 1296.9804
>>> > 5 2924 2725 1278.1877
>>> > 6 2924 2721 1346.9126
>>> >
>>> >
>>> > Basically, for every location in id_cell I should pick up the
distance
>>> > to
>>> > other locations in id_cell from censDist, and allocate it in M
>>> >
>>> > I have not come up with a satisfactory vectorizion of this problem and
>>> > using a loop is out of question.
>>> >
>>> > Thanks for your help
>>> > Mario
>>> >
>>> >
>>
[[alternative HTML version deleted]]
More information about the R-help
mailing list