[R-sig-Geo] Remove duplicates objects

Roger Bivand Roger.Bivand at nhh.no
Wed Jul 22 15:11:55 CEST 2009


On Tue, 21 Jul 2009, Jean-Paul Kibambe Lubamba wrote:

> Hi All,
>
> I have a huge polyline file (slightly less than 2.10^6 objects) containing
> some geographically duplicated objects. I tried to post some topological
> rules using a GIS but due to the large number of objects, topology
> validation and correction take a very long time.
>
> So, I was thinking that may be there is a way to remove duplicate objects
> using R and even for a huge file, it could take less time than using a
> GIS.

GIS are designed to do these kinds of things - especially the database GIS 
like PostGIS, but R would find the comparison of these (2M by 2M 
comparisons) hard. Of course, it could be done, but GIS are the 
appropriate tools. The PostGIS ~= operator returns TRUE if the geometry A 
is the same as B, so looks very much what you need, provided the 
coordinates are numerically identical.

If your data include both polylines and attributes, and the attributes are 
also duplicated, then maybe you could do something in R by only retaining 
"rows" with unique() values of the identifying attribute, but you'll need 
a good deal of memory to read in the initial object.

Hope this helps,

Roger

>
> Does anyone know how I could handle this ?
>
> Any help is welcome ! Thanks in advance.
>
>
> Jean-Paul
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list