[R-sig-Geo] read shapefile into R larger than 2GB

Roger Bivand Roger.Bivand at nhh.no
Wed Apr 4 15:03:16 CEST 2012


On Wed, 4 Apr 2012, Carlos Valenzuela wrote:

> Thank you all for your replies.
>
> Rainer - In regards to the size of the .shp, it is due to the .dbf file.
> There are over 500,000 cases (for the entire city). Now the kicker is that
> we also want to include more data in the future. But, it seems like that
> we would be even more difficult as all of you imply.
>
> Roger - I am actually using a 64-bit Windows 7 machine with 12 GB RAM,
> where I attempted to use both spdep() and readOGR(). I also tried this on
> a 64-bit Linux server with 32 GB RAM, but only trying to use spdep().As I
> explained to Rainer, there are 500,000 cases and we do want to make some
> inferences on the results that would incorporate even more data...But we
> will see if that is possible.

But what is the support of the observations? Is it real estate at the 
building scale? Are they polygons or points? Does DBF support tables this 
long? Your workflow needs revisiting, I think? Please make your needs 
clearer, what is the input, and what inferences do you want to draw? Is 
the spatial process of interest? How heterogeneous are the data? With data 
of this size, standard analysis techniques will probably need to be 
modified, as simply doing lm() on 500K observations and some hundreds of 
variables will be very demanding even on a machine with plenty of memory. 
If you are thinking of spatial regression, you need to take account of 
possibly many copies of the data in memory with either ML or GM fitting.

Roger


>
> I am going reduce the number of attributes in the table, but I was hoping
> that the number of observations is not the issue because it seems like R
> was only importing half of the observations.
>
> Best,
>
> Carlos
>
>
> On 4/4/12 6:48 AM, "Roger Bivand" <Roger.Bivand at nhh.no> wrote:
>
>> On Wed, 4 Apr 2012, Rainer M Krug wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> On 03/04/12 20:44, Carlos Valenzuela wrote:
>>>> Hello all, I was hoping someone may be able to help me with this
>>>> problem. I am trying to read a shapefile into R that is larger than
>>>> 2GB. I?ve tried
>>>
>>> In my opinion, a 2GB shape file is insane....
>>
>> Yes, it doesn't seem well-considered. Is this LiDAR data? Are you
>> intending to do statistics on the complete data set? What would you infer
>> from the results?
>>
>> You have not included the output of sessionInfo() - I suspect that you
>> are
>> using a 32-bit system, which would fail in any case.
>>
>> Roger
>>
>>>
>>> Is the attribute table (.dbf) file as big or is it the shp? If it is
>>> the
>>> .dbf, you have to look if you need all attributes. If it is the .shp,
>>> you could possibly split the shaope file in more then one actual layer?
>>> Also, import into a SpatiaLite or even PostGIS database might help you
>>> -
>>> then you can easier import a subset of features.
>>>
>>> Cheers,
>>>
>>> Rainer
>>>
>>>
>>>> using readShapePoly() in spdep as well as the readOGR() in rgdal with
>>>> no luck.,
>>>>
>>>> Using the readShapePoly(), I get:, ?failed on DBF filefseek? on  a
>>>> series of lines (over
>>>> 284,000),
>>>>
>>>> When using rgdal, I get this:
>>>>
>>>> Warning message:, In readOGR(".", "nameoffiles") :, Deleted feature
>>>> IDs: 284284, 284285,
>>>> 284286, 284287, 284288, 284289, 284290, 284291, 284292, 284293,
>>>> 284294, 284295, 284296, 284297,
>>>> 284298, 284299, 284300, 284301, 284302, 284303, 284304, 284305,
>>>> 284306, 284307, 284308, 284309,
>>>> 284310, 284311, 284312, 284313, 284314, 284315, 284316, 284317,
>>>> 284318, 284319, 284320, 284321,
>>>> 284322, 284323, 284324, 284325, 284326, 284327, 284328, 284329,
>>>> 284330, 284331, 284332, 284333,
>>>> 284334, 284335, 284336, 284337, 284338, 284339, 284340, 284341,
>>>> 284342, 284343, 284344, 284345,
>>>> 284346, 284347, 284348, 284349, 284350, 284351, 284352, 284353,
>>>> 284354, 284355, 284356, 284357,
>>>> 284358, 284359, 284360, 284361, 284362, 284363, 284364, 284365,
>>>> 284366, 284367, 284368, 284369,
>>>> 284370, 284371, 284372, 284373, 284374, 284375, 284376, 284377,
>>>> 284378, 284379, 284380, 284381,
>>>> 284382, 284383, 284384, 284385, 284386, 284387, 284388, 284389,
>>>> 284390, 284391, 284392, 284393,
>>>> 284394, 284395, 284396, 284397, 284398, 284399, 284400, 284401,
>>>> 284402, 284403, 284404, 284405,
>>>> 284 [... truncated],
>>>>
>>>> In other words, I only get half of the data imported into R and need
>>>> to get all of it in.
>>>>
>>>> Thank you,
>>>>
>>>> Carlos
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________ R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>> - --
>>> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
>>> Biology, UCT), Dipl. Phys.
>>> (Germany)
>>>
>>> Centre of Excellence for Invasion Biology
>>> Stellenbosch University
>>> South Africa
>>>
>>> Tel :       +33 - (0)9 53 10 27 44
>>> Cell:       +33 - (0)6 85 62 59 98
>>> Fax :       +33 - (0)9 58 10 27 44
>>>
>>> Fax (D):    +49 - (0)3 21 21 25 22 44
>>>
>>> email:      Rainer at krugs.de
>>>
>>> Skype:      RMkrug
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.11 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>>
>>> iEYEARECAAYFAk98CbsACgkQoYgNqgF2egq1jQCggSM/x65ppcpy8oT3FMptgCpP
>>> dgkAn3Ls1yTRnyk5zxouGA216vU4iKKX
>>> =eY6g
>>> -----END PGP SIGNATURE-----
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, NHH Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list