[R-sig-Geo] Missing local R-squared and residuals in gwr output

Roger Bivand Roger.Bivand at nhh.no
Mon May 7 20:45:25 CEST 2012


On Mon, 7 May 2012, "Sproß, Johann" wrote:

>
>
>
> --
> Mag. J. Maximilian Sproß
> Institute of Geography, University of Innsbruck
> Innrain 52
> A-6020 INNSBRUCK
>
> Tel. +43 (0)512 507 5413
> web: http://www.uibk.ac.at/geographie/projects/lidar/
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
> Gesendet: Mo 07.05.2012 14:48
> An: Maximilian Sproß
> Cc: r-sig-geo
> Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output
>
> On Mon, 7 May 2012, Maximilian Sproß wrote:
>
>> Dear Roger!
>>
>> Thank you very much for your fast reply and work!
>>
>> I'm not really an expert in HPC-computing, but i will try to report as goog
>> as i can.
>>
>> I updated spgwr and started a job on the cluster which takes normally 1,5 h.
>> So far, it run for 5 hours, which indicates that the parallelization does not
>> work efficient anymore. The function makeCluster(64, type="MPI") worked fine.
>> Our cluster runs with openMPI.
>
> Correct. I'll try to add back an option to use snow instead of parallel.
>
> I tried out the new version but it seems still using parallel.
>
> code:
>
> gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)

Add use_snow=TRUE to the command to switch to snow.

Roger

> Loading required package: parallel
>
> Attaching package: 'parallel'
>
> The following object(s) are masked from 'package:snow':
>
>    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>    clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
>    parCapply, parLapply, parRapply, parSapply, splitIndices,
>    stopCluster
>
> Max
>
>
> When it reaches R-forge, its revision number will be > 1252.
>
> Roger
>
>>
>> In that context, i found on the CRAN Task view: High-Performance and Parallel
>> Computing with R the following:
>> "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in
>> R is starting with release 2.14.0 which includes a new package parallel
>> incorporating (slightly revised) copies of packages multicore and snow (*but
>> excluding MPI, PVM and NWS clusters*). Does the new parallel support works
>> still in the openMPI environment?
>>
>> regards,
>>
>> Max
>>
>> fyi:
>>
>> sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
>> [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
>> [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
>> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
>> [5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
>> [9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
>> [13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
>> [17] snow_0.3-8      Rmpi_0.5-9
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.14.0
>>
>>
>> On 05/05/2012 04:24 PM, Roger Bivand wrote:
>>> On Fri, 4 May 2012, Maximilian Sproß wrote:
>>>
>>>> Dear r-sig-geo list!
>>>>
>>>> I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot
>>>> "SDF"), the gwr residuals and the local R-squared are missing. When
>>>> performing the same model on the local machine, these components are
>>>> included. Unfortunately, the calculation in this way takes about 5 days
>>>> instead of few hours when using the cluster.
>>>>
>>>> Perhaps, that problem arises due to the argument "fit.points", which has
>>>> to be passed if the local coefficient estimates should be made on a
>>>> multi node cluster.
>>>>
>>>> Does anyone have an idea how to solve that problem with the missing
>>>> local R-squared and residuals if the gwr is calculated on a cluster?
>>>
>>> The understanding for use on a cluster was that the data points and the fit
>>> points are different, so there is no observed dependent variable at the fit
>>> point, hence no local R2. I've added logic in the code that checks for
>>> equality between the fit and data points, and this for me resolves the
>>> problem, but may break other things. I've committed to R-forge, project
>>> rspatial, module spgwr. The source tarball and binary packages should be
>>> available later this evening European time from:
>>>
>>> https://r-forge.r-project.org/R/?group_id=1014
>>>
>>> Could you please try it out, and report back? I should also migrate spgwr
>>> from snow to parallel before I release it.
>>>
>>> Best wishes,
>>>
>>> Roger
>>>
>>>>
>>>>
>>>> Thank you very much in advance!
>>>>
>>>> Kind regards,
>>>>
>>>> Max
>>>>
>>>>
>>>> selected R-code:
>>>>
>>>> ### gwr on local machine:
>>>>
>>>> gwr_50 <-
>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
>>>> data=hef, bandwidth=50, gweight=gwr.Gauss)
>>>>
>>>>
>>>> # part of the  str(gwr_50) output...
>>>>
>>>>
>>>> List of 11
>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
>>>> slots
>>>>   .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
>>>>   .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
>>>>   .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
>>>>   .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
>>>> ...
>>>>   .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149
>>>> ...
>>>>   .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
>>>>   .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147
>>>> -0.00144 ...
>>>>   .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
>>>> 0.0406 ...
>>>>   .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
>>>>   .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...
>>>>
>>>>
>>>>
>>>>
>>>> ### gwr on cluster :
>>>>
>>>> cl <- makeCluster(32, type="MPI")
>>>>
>>>> coords <- coordinates(hef)
>>>>
>>>> gw <-
>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
>>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
>>>> hatmatrix=FALSE, cl=cl)
>>>>
>>>> # part of the  str(gwr_50) output...
>>>>
>>>> List of 11
>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
>>>> slots
>>>>   .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
>>>>   .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
>>>>   .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
>>>>   .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309
>>>> ...
>>>>   .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
>>>> ...
>>>>   .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
>>>>   .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...
>>>>
>>>>
>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>
>>
>>
>
> --
> Roger Bivand
> Department of Economics, NHH Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list