[R-sig-Geo] Count occurrences less memory expensive than superimpose function in several spatial objects [SOLVED]
ASANTOS
@|ex@ndre@@nto@br @end|ng |rom y@hoo@com@br
Thu Aug 20 17:18:19 CEST 2020
Thanks Vijay,
Now works and I don't to need pass my shapefiles objects to ppp for
using the superimpose function. The solution now is:
#Packages library(spatstat) library(dplyr) library(sp) library(rgdal)
library(raster) library(data.table) #Point process example data(ants)
ants.df<-as.data.frame(ants) #Convert to data frame # Sample 75% in
original dataset, repeat this 9 times and create a shapefile in each
loop for(i in 1:9){ s.ants.df<-sample_frac(ants.df, 0.75)
s.ants<-ppp(x=s.ants.df[,1],y=s.ants.df[,2],window=ants$window)#Create
new ppp object sample.pts<-cbind(s.ants$x,s.ants$y) pts.sampling =
SpatialPoints(sample.pts) UTMcoor.df <-
SpatialPointsDataFrame(pts.sampling,
data.frame(id=1:length(pts.sampling))) writeOGR(UTMcoor.df,
".",paste0('sample.shape',i), driver="ESRI Shapefile",overwrite=TRUE) }
#Read all the 9 shapefiles created all_shape <-
list.files(pattern="\\.shp$", full.names=TRUE) all_shape_list <-
lapply(all_shape, shapefile) # target_sub1 <-
rbindlist(lapply(all_shape_list, as.data.table)) res1 <- target_sub1[,
.(res=.N), by=.(coords.x1,coords.x2)] res.xy1 = res1[target_sub1,
on=c("coords.x1","coords.x2")] all.equal(res.xy1, res.xy1,
check.attributes=FALSE) # should return TRUE #Occurrences in the same
coordinate > 5 res_F<-res.xy1[res.xy1$res>5,]
res_F<-as.data.frame(res_F) #Final shapefile
final.pts<-cbind(res_F[,1],res_F[,2]) pts.final =
SpatialPoints(final.pts) UTMcoor.df <- SpatialPointsDataFrame(pts.final,
data.frame(id=1:length(pts.final))) UTMcoor.df2
<-remove.duplicates(UTMcoor.df) writeOGR(UTMcoor.df2, ".",
paste0('final.ants'), driver="ESRI Shapefile", overwrite=TRUE) #<END>
Alexandre
--
Alexandre dos Santos
Geotechnologies and Spatial Statistics applied to Forest Entomology
Instituto Federal de Mato Grosso (IFMT) - Campus Caceres
Caixa Postal 244 (PO Box)
Avenida dos Ramires, s/n - Vila Real
Caceres - MT - CEP 78201-380 (ZIP code)
Phone: (+55) 65 99686-6970 / (+55) 65 3221-2674
Lattes CV: http://lattes.cnpq.br/1360403201088680
OrcID: orcid.org/0000-0001-8232-6722
ResearchGate: www.researchgate.net/profile/Alexandre_Santos10
Publons: https://publons.com/researcher/3085587/alexandre-dos-santos/
--
Em 19/08/2020 20:49, Vijay Lulla escreveu:
> Hi Alexandre,
> As far as I can tell (mostly from reading the docs...no prior experience of
> using multiplicity or superimpose myself) it appears that they are just
> calculating the number of unique values for a combination of x,y coordinate
> pairs. So, you can do this by using the group by semantics of either
> tidyverse or SQL to generate the res.xy data.frame. Below is an example of
> generating res.xy alternatively using data.table (I'm not as familiar with
> tidyverse):
>
> target_sub1 <- rbindlist(lapply(target, as.data.table))
> res1 <- target_sub1[, .(res=.N), by=.(x,y)]
> res.xy1 = res1[target_sub1, on=c("x","y")]
>
> all.equal(res.xy, res.xy1, check.attributes=FALSE) # should return TRUE
>
> If you're using SQL then you just join the raw table with the grouped table
> and you should get the table coordinates and occurrences. And, considering
> the number of coordinates you have I recommend either data.table or SQL to
> generate the final output.
> HTH,
> Vijay.
>
>
> On Wed, Aug 19, 2020 at 4:22 PM ASANTOS via R-sig-Geo <
> r-sig-geo using r-project.org> wrote:
>
>> Dear r-sig-geo Members,
>>
>> ??? I'll like to read several shapefiles, count occurrences in the same
>> coordinate and create a final shapefile with a threshold number of
>> occurrences. I try to convert the shapefiles in ppp object (because I
>> have some part of my data set in shapefile and another in ppp objects)
>> and applied superimpose function without success. In my synthetic example :
>>
>> #Packages
>> library(spatstat)
>> library(dplyr)
>> library(sp)
>> library(rgdal)
>> library(raster)
>>
>>
>> #Point process example
>> data(ants)
>> ants.df<-as.data.frame(ants) #Convert to data frame
>>
>> # Sample 75% in original dataset, repeat this 9 times and create a
>> shapefile in each loop
>>
>> for(i in 1:9){
>> s.ants.df<-sample_frac(ants.df, 0.75)
>> s.ants<-ppp(x=s.ants.df[,1],y=s.ants.df[,2],window=ants$window)#Create
>> new ppp object
>> sample.pts<-cbind(s.ants$x,s.ants$y)
>> pts.sampling = SpatialPoints(sample.pts)
>> UTMcoor.df <- SpatialPointsDataFrame(pts.sampling,
>> data.frame(id=1:length(pts.sampling)))
>> writeOGR(UTMcoor.df, ".",paste0('sample.shape',i), driver="ESRI
>> Shapefile",overwrite=TRUE)
>> }
>>
>> #Read all the 9 shapefiles created
>> all_shape <- list.files(pattern="\\.shp$", full.names=TRUE)
>> all_shape_list <- lapply(all_shape, shapefile)
>>
>> #Convert shapefile to ppp statstat
>> target <- vector("list", length(all_shape_list))
>> for(i in 1:length(all_shape_list)){
>> target[[i]] <- ppp(x=all_shape_list[[i]]@coords[,1],
>> y=all_shape_list[[i]]@coords[,2],window=ants$window)}
>>
>> #Join all ppp objects using multiplicity
>> target_sub<-do.call(superimpose,target)
>> res<-multiplicity(target_sub)
>>
>> #Occurrences in the same coordinate > 5
>> res.xy<-as.data.frame(target_sub$x,target_sub$y,res)
>> res_F<-res.xy[res.xy$res>5,]
>>
>> #Final shapefile
>> final.pts<-cbind(res_F[,1],res_F[,2])
>> pts.final = SpatialPoints(final.pts)
>> UTMcoor.df <- SpatialPointsDataFrame(pts.final,
>> data.frame(id=1:length(pts.final)))
>> UTMcoor.df2 <-remove.duplicates(UTMcoor.df)
>> writeOGR(UTMcoor.df2, ".", paste0('final.ants'), driver="ESRI
>> Shapefile",overwrite=TRUE)
>>
>>
>> This approach works very well in this synthetic example!!! But in my
>> real data set a have the 99 shapefiles with 10^7 coordinates and when I
>> try to use the do.call(superimpose,target) function my 32GB RAM memory
>> crashed.
>>
>> Please any ideas for how I can create a new shapefile with a criteria
>> occurrences exposed but less memory expensive than superimpose all the
>> objects created?
>>
>> Thanks in advanced,
>> Alexandre
>>
>> --
>> Alexandre dos Santos
>> Geotechnologies and Spatial Statistics applied to Forest Entomology
>> Instituto Federal de Mato Grosso (IFMT) - Campus Caceres
>> Caixa Postal 244 (PO Box)
>> Avenida dos Ramires, s/n - Vila Real
>> Caceres - MT - CEP 78201-380 (ZIP code)
>> Phone: (+55) 65 99686-6970 / (+55) 65 3221-2674
>> Lattes CV: http://lattes.cnpq.br/1360403201088680
>> OrcID: orcid.org/0000-0001-8232-6722
>> ResearchGate: www.researchgate.net/profile/Alexandre_Santos10
>> Publons: https://publons.com/researcher/3085587/alexandre-dos-santos/
>> --
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list