[R-sig-Geo] read shapefile into R larger than 2GB

Rainer M Krug r.m.krug at gmail.com
Wed Apr 4 14:54:31 CEST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/04/12 14:49, Carlos Valenzuela wrote:
> Thank you all for your replies.
> 
> Rainer - In regards to the size of the .shp, it is due to the .dbf file. There are over 500,000
> cases (for the entire city). Now the kicker is that we also want to include more data in the
> future. But, it seems like that we would be even more difficult as all of you imply.

In this case you should try to only import the geometry without attribute table, and then link the
attribute table (from a sqlite database) in and read the required columns when you do the analysis.

> 
> Roger - I am actually using a 64-bit Windows 7 machine with 12 GB RAM, where I attempted to use
> both spdep() and readOGR(). I also tried this on a 64-bit Linux server with 32 GB RAM, but only
> trying to use spdep().As I explained to Rainer, there are 500,000 cases and we do want to make
> some inferences on the results that would incorporate even more data...But we will see if that
> is possible.
> 
> I am going reduce the number of attributes in the table, but I was hoping that the number of
> observations is not the issue because it seems like R was only importing half of the
> observations.

Very likely memory issue - I don't expect a hard coded limit(?). So dynamically linking the
attribute table should solve the problem.

Cheers,

Rainer

> 
> Best,
> 
> Carlos
> 
> 
> On 4/4/12 6:48 AM, "Roger Bivand" <Roger.Bivand at nhh.no> wrote:
> 
>> On Wed, 4 Apr 2012, Rainer M Krug wrote:
>> 
> On 03/04/12 20:44, Carlos Valenzuela wrote:
>>>>> Hello all, I was hoping someone may be able to help me with this problem. I am trying
>>>>> to read a shapefile into R that is larger than 2GB. I?ve tried
> 
> In my opinion, a 2GB shape file is insane....
>>> 
>>> Yes, it doesn't seem well-considered. Is this LiDAR data? Are you intending to do
>>> statistics on the complete data set? What would you infer from the results?
>>> 
>>> You have not included the output of sessionInfo() - I suspect that you are using a 32-bit
>>> system, which would fail in any case.
>>> 
>>> Roger
>>> 
> 
> Is the attribute table (.dbf) file as big or is it the shp? If it is the .dbf, you have to look
> if you need all attributes. If it is the .shp, you could possibly split the shaope file in more
> then one actual layer? Also, import into a SpatiaLite or even PostGIS database might help you -
>  then you can easier import a subset of features.
> 
> Cheers,
> 
> Rainer
> 
> 
>>>>> using readShapePoly() in spdep as well as the readOGR() in rgdal with no luck.,
>>>>> 
>>>>> Using the readShapePoly(), I get:, ?failed on DBF filefseek? on  a series of lines
>>>>> (over 284,000),
>>>>> 
>>>>> When using rgdal, I get this:
>>>>> 
>>>>> Warning message:, In readOGR(".", "nameoffiles") :, Deleted feature IDs: 284284,
>>>>> 284285, 284286, 284287, 284288, 284289, 284290, 284291, 284292, 284293, 284294, 284295,
>>>>> 284296, 284297, 284298, 284299, 284300, 284301, 284302, 284303, 284304, 284305, 284306,
>>>>> 284307, 284308, 284309, 284310, 284311, 284312, 284313, 284314, 284315, 284316,
>>>>> 284317, 284318, 284319, 284320, 284321, 284322, 284323, 284324, 284325, 284326, 284327,
>>>>> 284328, 284329, 284330, 284331, 284332, 284333, 284334, 284335, 284336, 284337, 284338,
>>>>> 284339, 284340, 284341, 284342, 284343, 284344, 284345, 284346, 284347, 284348, 284349,
>>>>> 284350, 284351, 284352, 284353, 284354, 284355, 284356, 284357, 284358, 284359, 284360,
>>>>> 284361, 284362, 284363, 284364, 284365, 284366, 284367, 284368, 284369, 284370, 284371,
>>>>> 284372, 284373, 284374, 284375, 284376, 284377, 284378, 284379, 284380, 284381, 284382,
>>>>> 284383, 284384, 284385, 284386, 284387, 284388, 284389, 284390, 284391, 284392,
>>>>> 284393, 284394, 284395, 284396, 284397, 284398, 284399, 284400, 284401, 284402, 284403,
>>>>> 284404, 284405, 284 [... truncated],
>>>>> 
>>>>> In other words, I only get half of the data imported into R and need to get all of it
>>>>> in.
>>>>> 
>>>>> Thank you,
>>>>> 
>>>>> Carlos
>>>>> 
>>>>> [[alternative HTML version deleted]]
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________ R-sig-Geo mailing list 
>>>>> R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 
> 
>>> 
>>> _______________________________________________ R-sig-Geo mailing list 
>>> R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>> 
>> 
>> -- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30,
>> N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail:
>> Roger.Bivand at nhh.no
>> 
> 
> 

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys.
(Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :       +33 - (0)9 53 10 27 44
Cell:       +33 - (0)6 85 62 59 98
Fax :       +33 - (0)9 58 10 27 44

Fax (D):    +49 - (0)3 21 21 25 22 44

email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk98RIcACgkQoYgNqgF2egp8eACfY6ETstwhvO4hFQqvdjKfGzWG
qZAAnRaqDEZXwF4UfgV0iCJJg0AbEdMl
=s+LE
-----END PGP SIGNATURE-----



More information about the R-sig-Geo mailing list