[R-sig-Geo] inconsistent as.data.frame(SpatialPointsDF)

MacQueen, Don macqueen1 at llnl.gov
Fri Mar 20 17:01:07 CET 2015


In my experience, relying on column names to extract the coordinates is
not at all a good idea. I would strongly recommend that you take the time
to update all of your scripts to use the coordinates() function. I think
it will be worth it in the long run.

It's not a good idea because the column names of the coordinates depend on
how the SpatialPointsDataFrame was originally created, and in my own
applications that is highly variable.  Sometimes ('x','y'), sometimes
('lon','lat'), or any of several other variations of how to spell or
abbreviate latitude and longitude (with or without capitalization). Or
('easting','northing'). Or, or, or... Trying to carefully control all that
is more trouble than it's worth; I just use, for example,
coordinates(obj)[,1] and coordinates(obj)[,2] if I want to pull them out
as vectors. Ugly, but I can count on it.

That said, if
  as.data.frame(locs)
produces different names for the coordinates when used in different
contexts, then you've got something else going on that should not be going
on. This is where Frede's suggestions might help. You will need to
carefully track the construction of your locs object and see if it is
somehow different in the two situations. I don't know of any "designed
circumstance" that would explain this.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 3/19/15, 4:02 PM, "dschneiderch" <Dominik.Schneider at colorado.edu> wrote:

>I have a spatial points dF that is causing me trouble. I've figured out
>what
>is happening but without a clue why.
>
>at the prompt, I do
>> locs
>class       : SpatialPointsDataFrame
>features    : 10
>extent      : -112.0623, -109.0571, 33.65387, 36.32678  (xmin, xmax, ymin,
>ymax)
>coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>variables   : 7
>names       : network, State, Station_ID, Site_ID,        Site_Name,
>Elevation_ft, Elevation_m
>min values  :    SNTL,    AZ,     09N05S,     308,      BAKER BUTTE,
>  
>7100,        2164
>max values  :    SNTL,    AZ,     12P01S,    1143, HANNAGAN MEADOWS,
>  
>9200,        2804
>
>> head(as.data.frame(locs))
>          x        y network State Station_ID Site_ID       Site_Name
>1 -111.4064 34.45660    SNTL    AZ     11R06S     308     BAKER BUTTE
>2 -111.3827 34.45547    SNTL    AZ     11R07S    1140 BAKER BUTTE SMT
>3 -109.5034 33.97883    SNTL    AZ     09S01S     310           BALDY
>4 -109.2166 33.69144    SNTL    AZ     09S06S     902     BEAVER HEAD
>5 -109.0571 36.32678    SNTL    AZ     09N05S    1143   BEAVER SPRING
>6 -112.0623 35.26247    SNTL    AZ     12P01S    1139       CHALENDER
>  Elevation_ft Elevation_m
>1         7300        2225
>2         7700        2347
>3         9125        2781
>4         7990        2435
>5         9200        2804
>6         7100        2164
>
>so as expected(?) my coordinate names get converted from Longitude,
>Latitude
>to x, y.
>
>However, when I run my script, the output of head(as.data.frame(locs)) is:
>    Longitude   Latitude network State Station_ID Site_ID       Site_Name
>1 -111.4064 34.45660    SNTL    AZ     11R06S     308     BAKER BUTTE
>2 -111.3827 34.45547    SNTL    AZ     11R07S    1140 BAKER BUTTE SMT
>3 -109.5034 33.97883    SNTL    AZ     09S01S     310           BALDY
>4 -109.2166 33.69144    SNTL    AZ     09S06S     902     BEAVER HEAD
>5 -109.0571 36.32678    SNTL    AZ     09N05S    1143   BEAVER SPRING
>6 -112.0623 35.26247    SNTL    AZ     12P01S    1139       CHALENDER
>  Elevation_ft Elevation_m
>1         7300        2225
>2         7700        2347
>3         9125        2781
>4         7990        2435
>5         9200        2804
>6         7100        2164
>
>
>I found out the hard way because i was doing
>as.data.frame(locs)[,c('x','y')] to get the coordinates.... I switched
>this
>line to coordinates(locs) but I have other lines in the same code that use
>as.data.frame() so I'm wondering if there are designed circumstance for
>one
>behavior compared to the other.  I did notice that
>data.frame(locs)[,c('x','y')] seems to always maintain the original
>coordinate names but I confirmed that the script uses as.data.frame()
>
>Below is dput of a sample of my data. does anyone else get this behavior?
>
>> dput(locs)
>new("SpatialPointsDataFrame"
>    , data = structure(list(network = c("SNTL", "SNTL", "SNTL", "SNTL",
>"SNTL",
>"SNTL", "SNTL", "SNTL", "SNTL", "SNTL"), State = c("AZ", "AZ",
>"AZ", "AZ", "AZ", "AZ", "AZ", "AZ", "AZ", "AZ"), Station_ID = c("11R06S",
>"11R07S", "09S01S", "09S06S", "09N05S", "12P01S", "09S07S", "11P02S",
>"11P13S", "09S11S"), Site_ID = c(308L, 1140L, 310L, 902L, 1143L,
>1139L, 416L, 1121L, 488L, 511L), Site_Name = c("BAKER BUTTE",
>"BAKER BUTTE SMT", "BALDY", "BEAVER HEAD", "BEAVER SPRING", "CHALENDER",
>"CORONADO TRAIL", "FORT VALLEY", "FRY", "HANNAGAN MEADOWS"),
>    Elevation_ft = c(7300L, 7700L, 9125L, 7990L, 9200L, 7100L,
>    8400L, 7350L, 7200L, 9020L), Elevation_m = c(2225L, 2347L,
>    2781L, 2435L, 2804L, 2164L, 2560L, 2240L, 2195L, 2749L)), .Names =
>c("network",
>"State", "Station_ID", "Site_ID", "Site_Name", "Elevation_ft",
>"Elevation_m"), row.names = 63:72, class = "data.frame")
>    , coords.nrs = c(7L, 6L)
>    , coords = structure(c(-111.40643, -111.38272, -109.50344, -109.21657,
>-109.05711,
>-112.06231, -109.15282, -111.74486, -111.84374, -109.30952, 34.4566,
>34.45547, 33.97883, 33.69144, 36.32678, 35.26247, 33.80392, 35.26806,
>35.07297, 33.65387), .Dim = c(10L, 2L), .Dimnames = list(NULL,
>    c("Longitude", "Latitude")))
>    , bbox = structure(c(-112.06231, 33.65387, -109.05711, 36.32678),
>.Dim =
>c(2L,
>2L), .Dimnames = list(c("Longitude", "Latitude"), c("min", "max"
>)))
>    , proj4string = new("CRS"
>    , projargs = "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
>)
>)
>
>
>This is running on a cluster at my university, using a SOCK cluster to
>parallelize dlply and then an MC backend for a ddply inside the dlply, if
>that's important. It seems to produce the expected behavior of my desktop.
>> sessionInfo()
>R version 3.1.2 (2014-10-31)
>Platform: x86_64-unknown-linux-gnu (64-bit)
>
>locale:
>[1] C
>
>attached base packages:
>[1] grid      parallel  stats     graphics  grDevices utils     datasets
>[8] methods   base
>
>other attached packages:
> [1] ncdf4_1.13          smwrBase_1.0.1      lubridate_1.3.3
> [4] digest_0.6.8        memoise_0.2.1       gridExtra_0.9.1
> [7] spdep_0.5-82        Matrix_1.1-4        fields_7.1
>[10] maps_2.3-9          spam_1.0-1          doSNOW_1.0.12
>[13] snow_0.3-13         doMC_1.3.3          iterators_1.0.7
>[16] foreach_1.4.2       ipred_0.9-3         MASS_7.3-37
>[19] RColorBrewer_1.1-2  rgdal_0.9-1         stringr_0.6.2
>[22] ggplot2_1.0.0       plyr_1.8.1          reshape2_1.4.1
>[25] raster_2.3-12       sp_1.0-17           ProjectTemplate_0.6
>
>loaded via a namespace (and not attached):
> [1] LearnBayes_2.15  Rcpp_0.11.3      boot_1.3-13      class_7.3-11
> [5] coda_0.16-1      codetools_0.2-9  colorspace_1.2-4 deldir_0.1-7
> [9] gtable_0.1.2     lattice_0.20-29  lava_1.3         munsell_0.4.2
>[13] nlme_3.1-118     nnet_7.3-8       prodlim_1.5.1    proto_0.3-10
>[17] rpart_4.1-8      scales_0.2.4     splines_3.1.2    survival_2.37-7
>
>
>
>--
>View this message in context:
>http://r-sig-geo.2731867.n2.nabble.com/inconsistent-as-data-frame-SpatialP
>ointsDF-tp7587920.html
>Sent from the R-sig-geo mailing list archive at Nabble.com.
>
>_______________________________________________
>R-sig-Geo mailing list
>R-sig-Geo at r-project.org
>https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list