[R] building a spatial matrix
A M Lavezzi
mario.lavezzi at unipa.it
Mon May 16 18:53:56 CEST 2016
Hi Sarah,
thanks a lot. In this line:
result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
result$distance
I had a problem with cbind(factor .. : the assignement to the [i,j] element
of the matrix did not work.
I solved in this manner:
- I modified censDist, so for every couple of location codes (f_cell,
f_cell_neigh), I added the corresponding id from id_cell (they are directly
interpretable as the coordinates of the 1327x1327 result.m matrix). So
censDist looks like this:
head(censDist)
f_cell f_cell_neigh distance id id_neigh
1 2924 2732 1309.7525 NA NA
2 2924 2875 696.2891 NA NA
3 2924 2351 1346.0561 NA 975
4 2924 2350 1296.9804 NA 758
5 2924 2725 1278.1877 NA NA
6 2924 2721 1346.9126 NA NA
Then I run your code with a slight modification of the last row:
resultS <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
id_cell$f_cell)
resultS.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
resultS.m[cbind(resultS$id, resultS$id_neigh)] <- resultS$distance
and it worked!
thank you so much,
Mario
On Fri, May 13, 2016 at 5:45 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> Sorry, you're right.
>
> The result line should be:
>
> result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <-
> result$distance
>
>
> idcell <- data.frame(
> id = seq_len(5),
> fcell = sample(1:100, 5))
>
> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
> censDist$distance <- runif(nrow(censDist))
>
> # assemble the non-symmetric distance matrix
> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
> idcell$fcell)
> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
> result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <-
> result$distance
>
> It's just about instantaneous on the dataset you sent me:
>
>
> system.time({
> result <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
> id_cell$f_cell)
> result.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
> result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
> result$distance
> })
>
> user system elapsed
> 0.361 0.007 0.368
>
>
>
>
> Sarah
>
>
> On Fri, May 13, 2016 at 10:36 AM, A M Lavezzi <mario.lavezzi at unipa.it>
> wrote:
> > PLEASE IGNORE THE PREVIOUS EMAIL, IT WAS SENT BY MISTAKE
> >
> > Hello Sarah
> > thanks a lot for your advice.
> >
> > I followed your suggestions unitil the creation of "result"
> >
> > The allocation of the values of result$distance to the matrix result.m,
> > however ,does not seem to work: it produces a matrix with identical
> columns
> > corresponding to the last values of result$distance. Maybe my
> description of
> > the dataset was not clear enough.
> >
> > I produced the final matrix spat_dist with a loop, that I report below
> (it
> > takes about 1 hour on my macbook pro),
> >
> > set_i = -1 # create a variable to store the i values already examined
> >
> > for(i in unique(result$id)){
> >
> > set_i=c(set_i,i) # store the value of the i
> >
> > set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in%
> set_i] #
> > identify the locations connected to i. If the distance between i and j
> was
> > examined before, don't look for the distance between j and i
> >
> > for(j in set_neigh){
> > if(i!=j){
> > spat_dist[i,j] = result$distance[result$id==i &
> result$id_neigh==j]
> > spat_dist[j,i] = spat_dist[i,j]
> > }
> > else{
> > spat_dist[i,j]=0
> > }
> > }
> > }
> >
> > It is not the most elegant and efficient solution in the world, that's
> for
> > sure.
> >
> > I would be grateful, if you could suggest an alternative instruction to:
> >
> > result.m[factor(result$fcell), factor(result$cellneigh)] <-
> result$distance
> >
> > so I will learn a faster procedure (I tried many times but to modify this
> > structure but I did not make it). I don't want to abuse of your time, so
> > forget it if you are busy
> >
> > Thank you so much anyway,
> > Mario
> >
> > ps I attach the data. Notice that the 1327 units in id_cell are firms,
> > indexed by id, located in location f_cell. Different firms can be
> located in
> > the same f_cell. With respect to your suggestion, I added two columns to
> > "result" with the id of the firms.
> >
> > On Fri, May 13, 2016 at 3:26 PM, A M Lavezzi <mario.lavezzi at unipa.it>
> wrote:
> >>
> >>
> >> Hello Sarah
> >> thanks a lot for your advice.
> >>
> >> I followed your suggestions unitl the creation of "result"
> >>
> >> The allocation of the values of result$distance to the matrix result.m,
> >> however ,does not seem to work: it produces a matrix with identical
> columns
> >> corresponding to the last values of result$distance. Maybe my
> description of
> >> the dataset was not clear enough.
> >>
> >> I produced the final matrix with a loop, that I report below (it takes
> >> about 1 hour on my macbook pro),
> >>
> >> set_i = -1 # create a variable to store the i values already examined
> >>
> >> for(i in unique(result$id)){
> >>
> >> set_i=c(set_i,i) # store the value of the i
> >>
> >> set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in%
> set_i]
> >> # identify the locations connected to i. Exclude those
> >>
> >> for(j in set_neigh){
> >> if(i!=j){
> >> spat_dist[i,j] = result$distance[result$id==i &
> result$id_neigh==j]
> >> spat_dist[j,i] = spat_dist[i,j]
> >> }
> >> else{
> >> spat_dist[i,j]=0
> >> }
> >> }
> >> }
> >>
> >> It not the most elegant and efficient solution in the world, that's for
> >> sure
> >>
> >>
> >>
> >> On Thu, May 12, 2016 at 2:51 PM, Sarah Goslee <sarah.goslee at gmail.com>
> >> wrote:
> >>>
> >>> I don't see any reason why a loop is out of the question, and
> >>> answering would have been much easier if you'd included the requested
> >>> reproducible data, but what about this?
> >>>
> >>> This solution is robust to pairs from idcell being absent in censDist,
> >>> and to the difference from A to B being different than the distance
> >>> from B to A, but not to A-B appearing twice. If that's possible,
> >>> you'll need to figure out how to manage it.
> >>>
> >>> # create some fake data
> >>>
> >>> idcell <- data.frame(
> >>> id = seq_len(5),
> >>> fcell = sample(1:100, 5))
> >>>
> >>> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
> >>> censDist$distance <- runif(nrow(censDist))
> >>>
> >>> # assemble the non-symmetric distance matrix
> >>> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
> >>> idcell$fcell)
> >>> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
> >>> result.m[factor(result$fcell), factor(result$cellneigh)] <-
> >>> result$distance
> >>>
> >>> Sarah
> >>>
> >>> On Thu, May 12, 2016 at 5:26 AM, A M Lavezzi <mario.lavezzi at unipa.it>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I have a sample of 1327 locations, each one idetified by an id and a
> >>> > numerical code.
> >>> >
> >>> > I need to build a spatial matrix, say, M, i.e. a 1327x1327 matrix
> >>> > collecting distances among the locations.
> >>> >
> >>> > M(i,i) should be 0, M(i,j) should contain the distance among
> location i
> >>> > and
> >>> > j
> >>> >
> >>> > I shoud use data organized in the following way:
> >>> >
> >>> > 1) id_cell contains the identifier (id) of each location (1...1327)
> and
> >>> > the
> >>> > numerical code of the location (f_cell) (see head of id_cell below)
> >>> >
> >>> >> head(id_cell)
> >>> > id f_cell
> >>> > 1 1 2120
> >>> > 12 2 204
> >>> > 22 3 2546
> >>> > 24 4 1327
> >>> > 34 5 1729
> >>> > 43 6 2293
> >>> >
> >>> > 2) censDist contains, for each location identified by its numerical
> >>> > code,
> >>> > the distance to other locations (censDist has 1.5 million rows). The
> >>> > head(consist) below, for example, reads like this:
> >>> >
> >>> > location 2924 has a distance to 2732 of 1309.7525
> >>> > location 2924 has a distance to 2875 of 696.2891,
> >>> > etc.
> >>> >
> >>> >> head(censDist)
> >>> > f_cell f _cell_neigh distance
> >>> > 1 2924 2732 1309.7525
> >>> > 2 2924 2875 696.2891
> >>> > 3 2924 2351 1346.0561
> >>> > 4 2924 2350 1296.9804
> >>> > 5 2924 2725 1278.1877
> >>> > 6 2924 2721 1346.9126
> >>> >
> >>> >
> >>> > Basically, for every location in id_cell I should pick up the
> distance
> >>> > to
> >>> > other locations in id_cell from censDist, and allocate it in M
> >>> >
> >>> > I have not come up with a satisfactory vectorizion of this problem
> and
> >>> > using a loop is out of question.
> >>> >
> >>> > Thanks for your help
> >>> > Mario
> >>> >
> >>> >
> >>
>
--
Andrea Mario Lavezzi
DiGi,Sezione Diritto e Società
Università di Palermo
Piazza Bologni 8
90134 Palermo, Italy
tel. ++39 091 23892208
fax ++39 091 6111268
skype: lavezzimario
email: mario.lavezzi (at) unipa.it
web: http://www.unipa.it/~mario.lavezzi
[[alternative HTML version deleted]]
More information about the R-help
mailing list