[R-sig-Geo] analysis on .dbf file instead of .shp

Michael Sumner mdsumner at gmail.com
Mon Jun 11 23:45:34 CEST 2012


Hello,

On Mon, Jun 11, 2012 at 8:46 PM, aniruddha ghosh <aniru123 at gmail.com> wrote:
> Hello list,
> I am trying to perform a regression analysis on a vector data (shape
> file). Some of the attributes of the shape files are the potential
> explanatory variables (lets say X1 and X2) and response variable (Y).
> Now instead of reading the shapefile, I'm using the associated .dbf
> file and performing the analysis.
> This looks like,
> ----------------------------------------
>>data<-read.dbf("test.dbf")
>>names(data)
>  "FID"  "X1"    "X2"    "Y"     "POINT_X"       "POINT_Y"
>>X<-cbind(data$X1,data$x2)
>>Y<-data$Y
>>summary(lm(Y~X))
> ----------------------------------------
> Question: Is it a good practice to use the .dbf file instead of the .shp file?
>

It should not matter, and you can obtain the same data (via the same
foreign::read.dbf function) by using the maptools functions
readShapePoints/Lines/Poly. You can always get the original data with
as.data.frame:

fname.shp <- system.file("shapes/baltim.shp", package="maptools")[1]
fname.dbf <- system.file("shapes/baltim.dbf", package="maptools")[1]

library(foreign)
dd <- read.dbf(fname.dbf)
names(dd)

library(maptools)
xx <- readShapePoints(fname.shp)
names(as.data.frame(xx))
 [1] "STATION"   "PRICE"     "NROOM"     "DWELL"     "NBATH"
"PATIO"     "FIREPL"    "AC"        "BMENT"     "NSTOR"     "GAR"
 "AGE"       "CITCOU"    "LOTSZ"     "SQFT"
[16] "X"         "Y"         "coords.x1" "coords.x2"

Note that for the SpatialPointsDataFrame you also get the spatial
coordinates as extra columns (in this case it is a simple one-to-one
of point coordinates to attributes, which won't always be true for
MULTIPOINT or line/polygon geometries).

Apart from the spatial coordinate values, there are some attribute
differences, but the dimensions, names and column class of the two
data.frames is the same:

all.equal(dd, as.data.frame(xx)[,-c(18, 19)])
[1] "Attributes: < Names: 1 string mismatch >"
                  "Attributes: < Length mismatch: comparison on first
2 components >"
[3] "Attributes: < Component 2: Lengths (17, 211) differ (string
compare on first 17) >" "Attributes: < Component 2: 17 string
mismatches >"


There is another route to read shapefile/dbf with readOGR() in the
rgdal package, and there might be slight differences with reading the
DBF that way since it is a completely different set of code under the
hood, though they would be subtle if at all and may just depend on the
vagaries of the file.  The return value is a Spatial*DataFrame as it
is for the maptools functions.

Cheers, Mike.

> Can I use the model developed here to predict some unknown Y with
> known X (obtained from another .dbf file), and combine the predicted Y
> as attribute to this .dbf file?
>
> I'm using the .dbf file beacuse it is allowing me to apply diiferent
> methods from different packages for prediction which I couldn't apply
> to the .shp files due to my limited knowledge in using R!
>
>
> Thanks,
> Aniruddha Ghosh
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



-- 
Michael Sumner
Hobart, Australia
e-mail: mdsumner at gmail.com



More information about the R-sig-Geo mailing list