[R-sig-Geo] Generating Random Planar Graph with 1m Edges

Thu Oct 20 14:18:43 CEST 2016

For Completeness:

This was what I suggested to Kimon-Vincent, offline, as I felt quite
uncertain as to my input.

Kimon-Vincent,

I write off list because perhaps my suggestion is just too stupid and I
don't want to further solidify my reputation for 'stupid' suggestions. But
it seems to me, taking Roger's RANN suggestion in mind, that the test for
E, be taken, not just at the end, as in the ABCDE approach the paper
suggests, but upon selection of each C, prior to adding to a list of
successful (non-intersecting) edges using (rgeos) gIntersection upon a
candidate back against an existing list, prior to appending to said list.
Should it fail, next C. I need to give some more thought to what is meant
by D (though it seems to suggest a limit of neighbors, I'd have to read the
paper),do D, and finally E again as non-intersecting confirmatory.

And look back at your repeat{, (which I find quite interesting anyway) and
figure out how to vectorize using some apply family as this should greatly
reduce computation time. Flow control is useful, but teasing apart the
exact vector steps and the order you want them applied, while puzzling,
should remove your computation time bottlenecks. I'm not expert enough at
this time to advise.

Sorry if all of this seems silly.

On Tue, Oct 18, 2016 at 10:16 PM, Krenz, Kimon-Vincent <
kimon-vincent.krenz.12 at ucl.ac.uk> wrote:

> Dear Chris,
>
> Thanks a lot for your input and suggestion. It took me a while to fiddle
> my way through it, but I followed your logic, which I believe is probably
> the best way to go.
>
> Using RANN I randomly sampled a data.frame with possible combinations
> meeting the distance requirement, of which I then read
> the first row and remove it from the list. Then I created a line pair,
> tested if it intersects with existing lines and if not added it to the line
> list. After this the process starts form the beginning until there are no
> pair combinations int he data.frame list.
>
> I did not use the gIntersect as I felt crossing.psp() was easier to
> handle. However, the process for 50000 lines takes roughly half an hour. So
> roughly 20 hours.
>
> In case you have an idea to speed up the process let me know! I have
> attached the code at the bottom.
>
> Best,
> Kimon
>
> Kimon Krenz
>
> PhD Researcher
> MSc SDAC Course Coordinator
> <http://www.bartlett.ucl.ac.uk/space-syntax/programmes/mres-msc/msc-spatial-design>
> Space Syntax Laboratory <http://www.bartlett.ucl.ac.uk/space-syntax>
>
> mail.       ucftkr3 at ucl.ac.uk
> phone.    0044 7784 329089
> web.       www.kimonkrenz.com
>
> Bartlett School of Architecture <http://www.bartlett.ucl.ac.uk/>
> UCL
> 140 Hampstead Road
> London     NW1 2BX     UK
>
> require(spatstat)
> require(maps)
> library(sp)
> #set coordinate frame
> w <- as.owin(list(xrange=c(32275175,32475175),yrange=c(5611910,5811910)))
> #generate poisson distribution point set (0.000025 for 1million points)
> Y <- rpoispp(0.0000001, win=w)
> Ydata <- data.frame(Y)
> list(Y)
>
> #create k nearest neighbour matrix of points
> require(RANN)
> #radius <- 5000
> #NN1 <- nn2(Ydata, k=10,  treetype = c("bd"), searchtype="radius", radius
> = radius)
> NN1 <- nn2(Ydata, k=30,  treetype = c("bd"), searchtype="standard")
> #delete first collumn
> NN1$nn.idx <- NN1$nn.idx[,-1]
> NN1$nn.dists <- NN1$nn.dists[,-1]
>
> #rearrange matrix from column to rows
> require(reshape2)
> NN2 <- melt(NN1$nn.idx)
> NN3 <- melt(NN1$nn.dists)
> #merge matrices together
> NN4 <- cbind(NN2,NN3)
> #remove unecessary columns
> NN5 <- NN4[ -c(2,4:5)]
> rm(NN2, NN3, NN4)
> #remove point pairs with a longer distance than x
> NN5 <- NN5[NN5$value.1 <= 5000, ]
>
> require(plyr)
> #change column name
> NN5 <- rename(NN5, c("Var1"="p1", "value"="p2", "value.1"="length"))
> #as data frame
> NN6 <- data.frame(NN5)
> #create logic
> NN6$Dup <- NN6$p1 < NN6$p2
> #subset data based on logic
> NN7 <- NN6[NN6$Dup == TRUE,]
> NN8 <- NN6[NN6$Dup == FALSE,]
> #swap columns
> NN8 <- rename(NN8, c("p1"="p2", "p2"="p1", "length"="length"))
> NN9 <- rbind(NN7,NN8)
> #remove logic column
> NN9 <- NN9[-c(4)]
> #remove duplicate
> rm(NN5, NN6, NN7, NN8)
> NN10 <- NN9[!duplicated(NN9), ]
> rm(NN9)
> #Reindex row names
> row.names(NN10) <- 1:nrow(NN10)
> #drop distance column
> NN10 <- NN10[,-3]
>
> require(dplyr)
> NN11 <- sample_n(NN10, 2000000, replace=F)
> NN13 <- NN11[!duplicated(NN11), ]
> rm(NN11)
>
> #split file into several sub data.frame for faster computation
> #n <- 50000
> #nr <- nrow(NN12)
> #NN13 <- split(NN12, rep(1:ceiling(nr/n), each=n, length.out=nr))
>
>
>
> #FIRST LINE
> l1 <- NN13[1, ]
> NN13 <- NN13[-1,]
> l1 <- c(t(l1))
> #read l1 from poisson point set
> Ydata2 <- Ydata[l1, c('x','y')]
> #divide into two point sets
> Z1 <- Ydata2[1,]
> Z2 <- Ydata2[2,]
> #create Line Segment Pattern
> Z <- psp(Z1[1, 1], Z1[1, 2], Z2[1, 1], Z2[1, 2], window=owin(w))
>
> #LOOP START
> repeat {
>   l1 <- NN13[1, ]
>   NN13 <- NN13[-1, ]
>   l1 <- c(t(l1))
>   #read l1 from poisson point set
>   Ydata2 <- Ydata[l1, c('x','y')]
>   #divide into two point sets
>   Z1 <- Ydata2[1,]
>   Z2 <- Ydata2[2,]
>   #create Line Segment Pattern
>   H <- psp(Z1[1, 1], Z1[1, 2], Z2[1, 1], Z2[1, 2], window=owin(w))
>   #intersecting lines check
>   J <- crossing.psp(Z,H,fatal=TRUE,details=FALSE)
>
>   H1 <- as.vector(H$ends, mode="numeric")
>   J$x <- setdiff(J$x, H1)
>   J$y <- setdiff(J$y, H1)
>   J$y1 <- data.frame(J$y)
>   J$x1 <- data.frame(J$x)
>   J$Ally <- count(J$y1)
>   J$Allx <- count(J$x1)
>
>   if ( J$Ally == 0 & J$Allx == 0 ) { Z <- append.psp(Z,H) }
>   else
>     if ( nrow(NN13)==0 ) { break }
>   else
>   { }
> }
>
>
> plot(owin(w))
> plot(Z, add=TRUE)
> detach("package:dplyr", unload=TRUE)
>
>
>
> On 06 Oct 2016, at 15:58, chris english <englishchristophera at gmail.com>
> wrote:
>
> Kimon-Vincent,
>
> I write off list because perhaps my suggestion is just too stupid and I
> don't want to further solidify my reputation for 'stupid' suggestions. But
> it seems to me, taking Roger's RANN suggestion in mind, that the test for
> E, be taken, not just at the end, as in the ABCDE approach the paper
> suggests, but upon selection of each C, prior to adding to a list of
> successful (non-intersecting) edges using (rgeos) gIntersection upon a
> candidate back against an existing list, prior to appending to said list.
> Should it fail, next C. I need to give some more thought to what is meant
> by D (though it seems to suggest a limit of neighbors, I'd have to read the
> paper),do D, and finally E again as non-intersecting confirmatory.
>
> And look back at your repeat{, (which I find quite interesting anyway) and
> figure out how to vectorize using some apply family as this should greatly
> reduce computation time. Flow control is useful, but teasing apart the
> exact vector steps and the order you want them applied, while puzzling,
> should remove your computation time bottlenecks. I'm not expert enough at
> this time to advise.
>
> Sorry if all of this seems silly.
>
> Chris
>
> On Wed, Oct 5, 2016 at 11:43 PM, Krenz, Kimon-Vincent <kimon-vincent.
> krenz.12 at ucl.ac.uk> wrote:
>
>> Dear All,
>>
>> I started a week ago to use R to solve a problem I am currently facing in
>> my PhD.
>> Apologies in advance for the long-winded explanation of my problem.
>>
>> I am trying to generate a random planar graph with approximately 1
>> million edges, where:
>>
>> A) nodes (points) feature spatial coordinates
>> B) the network has a given boundary
>>
>> C) edges (lines) are created if two points fall within a given distance
>> (e.g. 100 - 1000 metres)
>> D) degree (connectivity) ranges between given k max and min (e.g. k ≤ 5)
>> E) edges do not intersect
>>
>> This will result in something one might want to compare to a random
>> street network.
>>
>> I am following a method proposed here: Masucci, A. P., Smith, D., Crooks,
>> A., & Batty, M. (2009). Random planar graphs and the London street network.
>> European Physical Journal B, 71(2), 259–271. http://doi.org/10.
>> 1140/epjb/e2009-00290-4) link to paper: https://goo.gl/6XWW8P
>>
>> Masucci et al.: "We first introduce a random model for a static planar
>> graph. ... To build an ERPG we start with a Poisson distribution of N
>> points in a plane and we choose a distance r. To build the first segment,
>> we randomly pick up two points of the distribution that have a distance
>> less then r and we connect them. Then we continue to randomly pick up pairs
>> of points P and Q in the given points distribution that have a distance
>> less then r. If the segment PQ does not intersect any other line of the
>> graph, we add it to the graph. The process ends when we add the desired
>> number of edges E.”
>>
>> I hence, started with generating random points using the Poisson
>> distribution in a given spatial box (A and A):
>>
>> require(spatstat)
>> require(maps)
>> library(sp)
>> w <- as.owin(list(xrange=c(32275175,32475175),yrange=c(5611910,5811910)))
>> Y <- rpoispp(0.00001, win=w)
>> Ydata <- data.frame(Y)
>> list(Y)
>>
>> That seems to be quite straightforward in R.
>> I then followed the proposed method and wrote a simple loop, that selects
>> two points from Ydata
>> based on a random sample and adds the projection of the coordinate system.
>>
>> #399717 is = N from the rpoispp
>>
>> repeat {
>>   l1 <- sample(1:399717, 2, replace=F)
>>
>>  Ydata.sp<-Ydata[l1, c('x','y')]
>>   coordinates(Ydata.sp)<-~x+y
>>   crs.geo <- CRS("+init=epsg:4647 +proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996
>> +x_0=32500000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")
>>   proj4string(Ydata.sp) <- crs.geo
>>
>>   L = SpatialLines(list(Lines(list(Line(coordinates(Ydata.sp))),"X")))
>>   if (SpatialLinesLengths(L)<=4000){
>>     break
>>   }
>> }
>> plot(L)
>> SpatialLinesLengths(L)
>>
>> This fulfils the requirement of C) and I have not yet reached the point
>> to deal with D) and E).
>> My problem here is that the process takes way too long for a single pair
>> (approx 20 seconds, resulting in 250 days computational time).
>>
>> Is there a quicker way solve this in a more efficient manner?
>>
>> I have thought about selecting the first point by random and the second
>> one randomly based on an evaluation
>> of all points within a given radius to the first point, rather than two
>> complete random points.
>>
>> This would at least cut down the test of several thousand meaningless
>> combinations. However, I couldn't find a way to do this.
>> Another option might be gDistance from the (rgeos), but with 400k points,
>> the result seems to be not computable.
>>
>> I am also happy for any suggestions regarding requirements D and E, or
>> help with the task in general.
>>
>> Best,
>> Kimon
>>
>> Kimon Krenz
>>
>> PhD Researcher
>> MSc SDAC Course Coordinator<http://www.bartlett.ucl.ac.uk/space-syntax/
>> programmes/mres-msc/msc-spatial-design>
>> Space Syntax Laboratory<http://www.bartlett.ucl.ac.uk/space-syntax>
>>
>> mail.       ucftkr3 at ucl.ac.uk<mailto:ucftkr3 at ucl.ac.uk>
>> phone.    0044 7784 329089
>> web.       www.kimonkrenz.com<http://www.kimonkrenz.com/>
>>
>> Bartlett School of Architecture<http://www.bartlett.ucl.ac.uk/>
>> UCL
>> 140 Hampstead Road
>> London     NW1 2BX     UK
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>
>
>
>
>

	[[alternative HTML version deleted]]