[R-sig-Geo] Missing local R-squared and residuals in gwr output

Roger Bivand Roger.Bivand at nhh.no
Wed May 9 13:19:29 CEST 2012


On Wed, 9 May 2012, Maximilian Sproß wrote:

> Thank you Roger! The gwr on the MPI cluster works fine.
>
> However, now the output object includes the intially missing three data 
> slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains only 
> NA's.
> Sorry for of any inconvenience, but do you think you can solve that?

I do not see any problem there, and indeed it is after the results have 
been returned from the cluster. You can tell whether you have been into 
the code block starting on line 261 in spgwr/R/gwr.R if there is no line 
beginning with "postprocess_localR2" in the timings component of the 
output object. The conditions are:

((!fp.given || fit_are_data) && is.null(fittedGWRobject))

where the first is FALSE, the second TRUE and the third TRUE in your case. 
If the "pred" column in your output contains values that are not finite, 
this may happen in this code block.

If you cannot see what is going on, we need a smaller test data set that 
replicates the problem.

Roger

>
> Thanks in advance and all the best,
>
> Max
>
>
> On 05/07/2012 08:45 PM, Roger Bivand wrote:
>> On Mon, 7 May 2012, "Sproß, Johann" wrote:
>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Mag. J. Maximilian Sproß
>>> Institute of Geography, University of Innsbruck
>>> Innrain 52
>>> A-6020 INNSBRUCK
>>> 
>>> Tel. +43 (0)512 507 5413
>>> web: http://www.uibk.ac.at/geographie/projects/lidar/
>>> 
>>> 
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
>>> Gesendet: Mo 07.05.2012 14:48
>>> An: Maximilian Sproß
>>> Cc: r-sig-geo
>>> Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr 
>>> output
>>> 
>>> On Mon, 7 May 2012, Maximilian Sproß wrote:
>>> 
>>>> Dear Roger!
>>>> 
>>>> Thank you very much for your fast reply and work!
>>>> 
>>>> I'm not really an expert in HPC-computing, but i will try to report as 
>>>> goog
>>>> as i can.
>>>> 
>>>> I updated spgwr and started a job on the cluster which takes normally 1,5 
>>>> h.
>>>> So far, it run for 5 hours, which indicates that the parallelization does 
>>>> not
>>>> work efficient anymore. The function makeCluster(64, type="MPI") worked 
>>>> fine.
>>>> Our cluster runs with openMPI.
>>> 
>>> Correct. I'll try to add back an option to use snow instead of parallel.
>>> 
>>> I tried out the new version but it seems still using parallel.
>>> 
>>> code:
>>> 
>>> gwr_50 <- 
>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
>>> hatmatrix=FALSE, cl=cl)
>> 
>> Add use_snow=TRUE to the command to switch to snow.
>> 
>> Roger
>> 
>>> Loading required package: parallel
>>> 
>>> Attaching package: 'parallel'
>>> 
>>> The following object(s) are masked from 'package:snow':
>>>
>>>    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>>>    clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
>>>    parCapply, parLapply, parRapply, parSapply, splitIndices,
>>>    stopCluster
>>> 
>>> Max
>>> 
>>> 
>>> When it reaches R-forge, its revision number will be > 1252.
>>> 
>>> Roger
>>> 
>>>> 
>>>> In that context, i found on the CRAN Task view: High-Performance and 
>>>> Parallel
>>>> Computing with R the following:
>>>> "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support 
>>>> in
>>>> R is starting with release 2.14.0 which includes a new package parallel
>>>> incorporating (slightly revised) copies of packages multicore and snow 
>>>> (*but
>>>> excluding MPI, PVM and NWS clusters*). Does the new parallel support 
>>>> works
>>>> still in the openMPI environment?
>>>> 
>>>> regards,
>>>> 
>>>> Max
>>>> 
>>>> fyi:
>>>> 
>>>> sessionInfo()
>>>> R version 2.14.0 (2011-10-31)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> 
>>>> locale:
>>>> [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
>>>> [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
>>>> [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
>>>> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
>>>> 
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>>> [8] base
>>>> 
>>>> other attached packages:
>>>> [1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
>>>> [5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
>>>> [9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
>>>> [13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
>>>> [17] snow_0.3-8      Rmpi_0.5-9
>>>> 
>>>> loaded via a namespace (and not attached):
>>>> [1] grid_2.14.0
>>>> 
>>>> 
>>>> On 05/05/2012 04:24 PM, Roger Bivand wrote:
>>>>> On Fri, 4 May 2012, Maximilian Sproß wrote:
>>>>> 
>>>>>> Dear r-sig-geo list!
>>>>>> 
>>>>>> I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot
>>>>>> "SDF"), the gwr residuals and the local R-squared are missing. When
>>>>>> performing the same model on the local machine, these components are
>>>>>> included. Unfortunately, the calculation in this way takes about 5 days
>>>>>> instead of few hours when using the cluster.
>>>>>> 
>>>>>> Perhaps, that problem arises due to the argument "fit.points", which 
>>>>>> has
>>>>>> to be passed if the local coefficient estimates should be made on a
>>>>>> multi node cluster.
>>>>>> 
>>>>>> Does anyone have an idea how to solve that problem with the missing
>>>>>> local R-squared and residuals if the gwr is calculated on a cluster?
>>>>> 
>>>>> The understanding for use on a cluster was that the data points and the 
>>>>> fit
>>>>> points are different, so there is no observed dependent variable at the 
>>>>> fit
>>>>> point, hence no local R2. I've added logic in the code that checks for
>>>>> equality between the fit and data points, and this for me resolves the
>>>>> problem, but may break other things. I've committed to R-forge, project
>>>>> rspatial, module spgwr. The source tarball and binary packages should be
>>>>> available later this evening European time from:
>>>>> 
>>>>> https://r-forge.r-project.org/R/?group_id=1014
>>>>> 
>>>>> Could you please try it out, and report back? I should also migrate 
>>>>> spgwr
>>>>> from snow to parallel before I release it.
>>>>> 
>>>>> Best wishes,
>>>>> 
>>>>> Roger
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thank you very much in advance!
>>>>>> 
>>>>>> Kind regards,
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> 
>>>>>> selected R-code:
>>>>>> 
>>>>>> ### gwr on local machine:
>>>>>> 
>>>>>> gwr_50 <-
>>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
>>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss)
>>>>>> 
>>>>>> 
>>>>>> # part of the  str(gwr_50) output...
>>>>>> 
>>>>>> 
>>>>>> List of 11
>>>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 
>>>>>> 5
>>>>>> slots
>>>>>>   .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
>>>>>>   .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
>>>>>>   .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 
>>>>>> ...
>>>>>>   .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
>>>>>> ...
>>>>>>   .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 
>>>>>> -0.149
>>>>>> ...
>>>>>>   .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 
>>>>>> ...
>>>>>>   .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 
>>>>>> -0.00147
>>>>>> -0.00144 ...
>>>>>>   .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
>>>>>> 0.0406 ...
>>>>>>   .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 
>>>>>> ...
>>>>>>   .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 
>>>>>> ...
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ### gwr on cluster :
>>>>>> 
>>>>>> cl <- makeCluster(32, type="MPI")
>>>>>> 
>>>>>> coords <- coordinates(hef)
>>>>>> 
>>>>>> gw <-
>>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
>>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
>>>>>> hatmatrix=FALSE, cl=cl)
>>>>>> 
>>>>>> # part of the  str(gwr_50) output...
>>>>>> 
>>>>>> List of 11
>>>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 
>>>>>> 5
>>>>>> slots
>>>>>>   .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
>>>>>>   .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
>>>>>>   .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
>>>>>>   .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 
>>>>>> 0.309
>>>>>> ...
>>>>>>   .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
>>>>>> ...
>>>>>>   .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
>>>>>>   .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>
>>>>>>     [[alternative HTML version deleted]]
>>>>>> 
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> Roger Bivand
>>> Department of Economics, NHH Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>> e-mail: Roger.Bivand at nhh.no
>>> 
>>> 
>> 
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list