[R-sig-Geo] regression and cluster analysis with shapefiles: data import problems

Roger Bivand Roger.Bivand at nhh.no
Wed Apr 26 00:05:41 CEST 2006


On Tue, 25 Apr 2006, Danlin Yu wrote:

> Anja:
> 
> If you have a shapefile imported using read.shape, you can view the
> header (the attribute names) of your shapefile object using
> "colnames(shapeobject$att.data)"
> 
> Similarly, you can assign individual attributes as R objects using
> command like:
> 
> attribute1 <- shapeobject$att.data$attribute1
> 
> For polygons, you can get the X, Y, coordinates by using get.Pcent
> (shapeobject) in the package maptools (which will be loaded when you
> load the package spdep). I am quite sure you can do similar work with
> point file as well (Roger mentioned that to me once, but since I have
> access to ArcGIS, and as a habit I always creat the X, Y fields for my
> data, I forgot the procedure).

It always helps to paste copies of the commands you are giving, usually 
together with sessionInfo() to give the package versions.

If you are using the maptools package, you can use read.shape() to return 
an object of class "Map", as the help page for the function explains.

> library(maptools)
Loading required package: foreign
Loading required package: sp
> sessionInfo()
Version 2.3.0 (2006-04-24) 
i686-pc-linux-gnu 

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
[7] "base"     

other attached packages:
maptools       sp  foreign 
"0.5-11" "0.8-15" "0.8-12" 
> list.files(pattern="shp")
[1] "baltim.shp"   "columbus.shp" "fylk-val.shp" "sids.shp"    

These are the shapefiles distributed with the maptools package, by the 
way.

> col1 <- read.shape("columbus.shp")
Shapefile type: Polygon, (5), # of Shapes: 49
> class(col1)
[1] "Map"
> names(col1$att.data)
 [1] "AREA"       "PERIMETER"  "COLUMBUS_"  "COLUMBUS_I" "POLYID"    
 [6] "NEIG"       "HOVAL"      "INC"        "CRIME"      "OPEN"      
[11] "PLUMB"      "DISCBD"     "X"          "Y"          "NSA"       
[16] "NSB"        "EW"         "CP"         "THOUS"      "NEIGNO"    
> col1data <- col1$att.data 
> lm(CRIME ~ HOVAL + INC, data=col1data)

Call:
lm(formula = CRIME ~ HOVAL + INC, data = col1data)

Coefficients:
(Intercept)        HOVAL          INC  
    68.6190      -0.2739      -1.5973  


If you use the more user-friendly sp classes with functions from the 
maptools package, you do:

> getinfo.shape("columbus.shp")
Shapefile type: Polygon, (5), # of Shapes: 49

to find out what kind it is, and use one of readShapePoints(), 
readShapeLines() or readShapePoly():

> col2 <- readShapePoly("columbus.shp")
> class(col2)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

for which lm() just works:

> lm(CRIME ~ HOVAL + INC, data=col2)

Call:
lm(formula = CRIME ~ HOVAL + INC, data = col2)

Coefficients:
(Intercept)        HOVAL          INC  
    68.6190      -0.2739      -1.5973  

and should you need the centroids of polygons (or point locations), the 
coordinate method returns them:

> coords <- coordinates(col2)
> str(coords)
 num [1:49, 1:2] 8.83 8.33 9.01 8.46 9.01 ...

where str() is a helper function showing the structure of an object. There 
are other ways, but this is the simplest, because sp class SpatialPolygons 
component Polygons objects store their centroids. By the way, the plot 
methods for these classes of objects are also pretty good, and new 
variables can just be added:

> mylm <- lm(CRIME ~ HOVAL + INC, data=col2)
> col2$mylm_res <- residuals(mylm)
> spplot(col2, "mylm_res")

(with lots of options for modification if needed, but now nobody can say 
that they couldn't map the residuals!)

In fact currently the easiest shapefile reader (readOGR()) is in the rgdal
package, because it also reads the *.prj file, and so creates the sp class
object with the appropriate metadata, and finds out automatically if the 
spatial entities are points, lines, or polygons. If your planners, Anja, 
need to keep the coordinate reference system tidy, this may be worth 
looking at, combined with the still slightly messy writing of shapefiles:

writePolyShape(), writeLinesShape(), or writePointsShape() in the maptools 
package and showWKT() in the rgdal package to write the appropriate *.prj.

If you are using read.shapefile() in the shapefiles package, you'll need 
to get at an internal element to find the data.frame:

> library(shapefiles)
> col3 <- read.shapefile("columbus")
> names(col3$dbf$dbf)
 [1] "AREA"       "PERIMETER"  "COLUMBUS_"  "COLUMBUS_I" "POLYID"    
 [6] "NEIG"       "HOVAL"      "INC"        "CRIME"      "OPEN"      
[11] "PLUMB"      "DISCBD"     "X"          "Y"          "NSA"       
[16] "NSB"        "EW"         "CP"         "THOUS"      "NEIGNO"    
> col3data <- col3$dbf$dbf
> lm(CRIME ~ HOVAL + INC, data=col3data)

Call:
lm(formula = CRIME ~ HOVAL + INC, data = col3data)

Coefficients:
(Intercept)        HOVAL          INC  
    68.6190      -0.2739      -1.5973  


will do it.

Roger

> 
> Hope this helps.
> 
> Cheers,
> 
> ___________________________________________
> Danlin Yu, Ph.D.
> Assistant Professor
> Department of Earth & Environmental Studies
> Montclair State University
> Montclair, NJ, 07043
> Tel: 973-655-4313
> Fax: 973-655-4072
> email: yud at mail.montclair.edu
> 
> ----- Original Message -----
> From: Anja Matatko <Anja.Matatko at alta4.com>
> Date: Tuesday, April 25, 2006 11:37 am
> Subject: [R-sig-Geo] regression and cluster analysis with shapefiles: data import problems
> 
> > 
> > Dear list,
> > 
> > I have a shapefile with several numerical attributes and want to 
> > do cluster and regression analysis with this data 
> > (sociodemographic data). I use the read.shapefile-command to 
> > import my data (it is well imported, it appears as an object), and 
> > then my first problem occurs: 
> > 
> > I want to do the analysis as in Anselin's "An Introduction to 
> > Spatial Regression Analysis in R" or the FAO publication by 
> > Petrucci / Salvati / Seghieri "The application of a spatial 
> > regression model to the analysis and mapping of poverty", but they 
> > all use other raw data than shapefiles and I couldn't find a hint 
> > how to convert my shapefile to a format that supports the lm() or 
> > similar functions. 
> > 
> > I think there are two problems to solve:
> > 
> > - I need to have access to the headers of my shapefile attribute 
> > data, in order to execute commands as mydata$variable1 (actually, 
> > I can't call mydata$variable1, I just get the answer NULL)
> > 
> > - I need something to get the spatial information contained in the 
> > shapefile (my attribute table has no x y coordinates included). Or 
> > is it really necessary to insert x and y columns in the attribute 
> > table with the help of ArcGIS? 
> > 
> > Thanks for help,
> > 
> > Anja (student of applied geography writing masters thesis about 
> > the use of geographical information systems and several "spatial 
> > statistics software" - including R - in planning support, in 
> > domains not much affiliated to spatial analysis but working with 
> > spatial data)
> > 
> > 
> > 
> > -------------------------------------------- 
> > Anja Matatko
> > alta4 Geoinformatik AG 
> > Frauenstraße 8-9 
> > 54290 Trier/Germany 
> > voice: +49.651.96626-0
> > fax: +49.651.96626-26 
> > email: anja.matatko at alta4.com 
> > internet: http://www.alta4.com 
> > 
> > Die nächsten GIS-Schulungen:
> > - ArcGIS 9 Umsteiger: 15. - 17.05.06 in München
> > - ArcGIS 9 Geoprocessing & Model Builder: 18. - 19.05.06 in München
> > - What's New in ArcGIS 9.1?: 22.-23.05.06 in Trier
> > - ArcGIS Digitalisieren: 02.06.06 in Trier
> > Weitere Infos: http://www.alta4.com/de/schulung/index.php
> > 
> > Treffen Sie uns auf folgenden Veranstaltungen:
> > - IT-Messe: 04.-05.05.06 in Trier
> > - ESRI Anwenderkonferenz: 09.-11.05.06 in Salzburg
> > Weitere Infos: http://www.alta4.com/de/alta4/events.php
> > 
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list