[R-sig-Geo] analysis on .dbf file instead of .shp
Michael Sumner
mdsumner at gmail.com
Mon Jun 11 23:45:34 CEST 2012
Hello,
On Mon, Jun 11, 2012 at 8:46 PM, aniruddha ghosh <aniru123 at gmail.com> wrote:
> Hello list,
> I am trying to perform a regression analysis on a vector data (shape
> file). Some of the attributes of the shape files are the potential
> explanatory variables (lets say X1 and X2) and response variable (Y).
> Now instead of reading the shapefile, I'm using the associated .dbf
> file and performing the analysis.
> This looks like,
> ----------------------------------------
>>data<-read.dbf("test.dbf")
>>names(data)
> "FID" "X1" "X2" "Y" "POINT_X" "POINT_Y"
>>X<-cbind(data$X1,data$x2)
>>Y<-data$Y
>>summary(lm(Y~X))
> ----------------------------------------
> Question: Is it a good practice to use the .dbf file instead of the .shp file?
>
It should not matter, and you can obtain the same data (via the same
foreign::read.dbf function) by using the maptools functions
readShapePoints/Lines/Poly. You can always get the original data with
as.data.frame:
fname.shp <- system.file("shapes/baltim.shp", package="maptools")[1]
fname.dbf <- system.file("shapes/baltim.dbf", package="maptools")[1]
library(foreign)
dd <- read.dbf(fname.dbf)
names(dd)
library(maptools)
xx <- readShapePoints(fname.shp)
names(as.data.frame(xx))
[1] "STATION" "PRICE" "NROOM" "DWELL" "NBATH"
"PATIO" "FIREPL" "AC" "BMENT" "NSTOR" "GAR"
"AGE" "CITCOU" "LOTSZ" "SQFT"
[16] "X" "Y" "coords.x1" "coords.x2"
Note that for the SpatialPointsDataFrame you also get the spatial
coordinates as extra columns (in this case it is a simple one-to-one
of point coordinates to attributes, which won't always be true for
MULTIPOINT or line/polygon geometries).
Apart from the spatial coordinate values, there are some attribute
differences, but the dimensions, names and column class of the two
data.frames is the same:
all.equal(dd, as.data.frame(xx)[,-c(18, 19)])
[1] "Attributes: < Names: 1 string mismatch >"
"Attributes: < Length mismatch: comparison on first
2 components >"
[3] "Attributes: < Component 2: Lengths (17, 211) differ (string
compare on first 17) >" "Attributes: < Component 2: 17 string
mismatches >"
There is another route to read shapefile/dbf with readOGR() in the
rgdal package, and there might be slight differences with reading the
DBF that way since it is a completely different set of code under the
hood, though they would be subtle if at all and may just depend on the
vagaries of the file. The return value is a Spatial*DataFrame as it
is for the maptools functions.
Cheers, Mike.
> Can I use the model developed here to predict some unknown Y with
> known X (obtained from another .dbf file), and combine the predicted Y
> as attribute to this .dbf file?
>
> I'm using the .dbf file beacuse it is allowing me to apply diiferent
> methods from different packages for prediction which I couldn't apply
> to the .shp files due to my limited knowledge in using R!
>
>
> Thanks,
> Aniruddha Ghosh
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
--
Michael Sumner
Hobart, Australia
e-mail: mdsumner at gmail.com
More information about the R-sig-Geo
mailing list