[R-sig-Geo] Missing local R-squared and residuals in gwr output

Roger Bivand Roger.Bivand at nhh.no
Thu May 10 11:10:16 CEST 2012


On Thu, 10 May 2012, Maximilian Sproß wrote:

> Dear Roger!
>
> Your are right, there are no problems anymore. I did some some comparative 
> tests with a small subset of the dataset. The number of values in the "pred" 
> column, which are not finite depends on the bandwidth. With increasing 
> bandwidth, the NA's disappear.
>
> Unfortunately, I cannot compute gwr.sel due to the large data amount.
>
> By the way, all problems are solved and the use of the cluster is a really 
> nice feature to decrease processing time efficiently.

Thanks for checking and reporting back. I'll release to CRAN shortly.

Best wishes,

Roger

>
> Thank you very much for your help!
>
> Max
>
> On 05/09/2012 01:19 PM, Roger Bivand wrote:
>> On Wed, 9 May 2012, Maximilian Sproß wrote:
>> 
>>> Thank you Roger! The gwr on the MPI cluster works fine.
>>> 
>>> However, now the output object includes the intially missing three data 
>>> slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains 
>>> only NA's.
>>> Sorry for of any inconvenience, but do you think you can solve that?
>> 
>> I do not see any problem there, and indeed it is after the results have 
>> been returned from the cluster. You can tell whether you have been into the 
>> code block starting on line 261 in spgwr/R/gwr.R if there is no line 
>> beginning with "postprocess_localR2" in the timings component of the output 
>> object. The conditions are:
>> 
>> ((!fp.given || fit_are_data) && is.null(fittedGWRobject))
>> 
>> where the first is FALSE, the second TRUE and the third TRUE in your case. 
>> If the "pred" column in your output contains values that are not finite, 
>> this may happen in this code block.
>> 
>> If you cannot see what is going on, we need a smaller test data set that 
>> replicates the problem.
>> 
>> Roger
>> 
>>> 
>>> Thanks in advance and all the best,
>>> 
>>> Max
>>> 
>>> 
>>> On 05/07/2012 08:45 PM, Roger Bivand wrote:
>>>> On Mon, 7 May 2012, "Sproß, Johann" wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Mag. J. Maximilian Sproß
>>>>> Institute of Geography, University of Innsbruck
>>>>> Innrain 52
>>>>> A-6020 INNSBRUCK
>>>>> 
>>>>> Tel. +43 (0)512 507 5413
>>>>> web: http://www.uibk.ac.at/geographie/projects/lidar/
>>>>> 
>>>>> 
>>>>> 
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
>>>>> Gesendet: Mo 07.05.2012 14:48
>>>>> An: Maximilian Sproß
>>>>> Cc: r-sig-geo
>>>>> Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr 
>>>>> output
>>>>> 
>>>>> On Mon, 7 May 2012, Maximilian Sproß wrote:
>>>>> 
>>>>>> Dear Roger!
>>>>>> 
>>>>>> Thank you very much for your fast reply and work!
>>>>>> 
>>>>>> I'm not really an expert in HPC-computing, but i will try to report as 
>>>>>> goog
>>>>>> as i can.
>>>>>> 
>>>>>> I updated spgwr and started a job on the cluster which takes normally 
>>>>>> 1,5 h.
>>>>>> So far, it run for 5 hours, which indicates that the parallelization 
>>>>>> does not
>>>>>> work efficient anymore. The function makeCluster(64, type="MPI") worked 
>>>>>> fine.
>>>>>> Our cluster runs with openMPI.
>>>>> 
>>>>> Correct. I'll try to add back an option to use snow instead of parallel.
>>>>> 
>>>>> I tried out the new version but it seems still using parallel.
>>>>> 
>>>>> code:
>>>>> 
>>>>> gwr_50 <- 
>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
>>>>> hatmatrix=FALSE, cl=cl)
>>>> 
>>>> Add use_snow=TRUE to the command to switch to snow.
>>>> 
>>>> Roger
>>>> 
>>>>> Loading required package: parallel
>>>>> 
>>>>> Attaching package: 'parallel'
>>>>> 
>>>>> The following object(s) are masked from 'package:snow':
>>>>>
>>>>>    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>>>>>    clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
>>>>>    parCapply, parLapply, parRapply, parSapply, splitIndices,
>>>>>    stopCluster
>>>>> 
>>>>> Max
>>>>> 
>>>>> 
>>>>> When it reaches R-forge, its revision number will be > 1252.
>>>>> 
>>>>> Roger
>>>>> 
>>>>>> 
>>>>>> In that context, i found on the CRAN Task view: High-Performance and 
>>>>>> Parallel
>>>>>> Computing with R the following:
>>>>>> "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct 
>>>>>> support in
>>>>>> R is starting with release 2.14.0 which includes a new package parallel
>>>>>> incorporating (slightly revised) copies of packages multicore and snow 
>>>>>> (*but
>>>>>> excluding MPI, PVM and NWS clusters*). Does the new parallel support 
>>>>>> works
>>>>>> still in the openMPI environment?
>>>>>> 
>>>>>> regards,
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> fyi:
>>>>>> 
>>>>>> sessionInfo()
>>>>>> R version 2.14.0 (2011-10-31)
>>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>> 
>>>>>> locale:
>>>>>> [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
>>>>>> [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
>>>>>> [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
>>>>>> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
>>>>>> 
>>>>>> attached base packages:
>>>>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>>>>> [8] base
>>>>>> 
>>>>>> other attached packages:
>>>>>> [1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
>>>>>> [5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
>>>>>> [9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
>>>>>> [13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
>>>>>> [17] snow_0.3-8      Rmpi_0.5-9
>>>>>> 
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] grid_2.14.0
>>>>>> 
>>>>>> 
>>>>>> On 05/05/2012 04:24 PM, Roger Bivand wrote:
>>>>>>> On Fri, 4 May 2012, Maximilian Sproß wrote:
>>>>>>> 
>>>>>>>> Dear r-sig-geo list!
>>>>>>>> 
>>>>>>>> I run gwr on a multi-node cluster(on 64 slots). In the gwr output 
>>>>>>>> (slot
>>>>>>>> "SDF"), the gwr residuals and the local R-squared are missing. When
>>>>>>>> performing the same model on the local machine, these components are
>>>>>>>> included. Unfortunately, the calculation in this way takes about 5 
>>>>>>>> days
>>>>>>>> instead of few hours when using the cluster.
>>>>>>>> 
>>>>>>>> Perhaps, that problem arises due to the argument "fit.points", which 
>>>>>>>> has
>>>>>>>> to be passed if the local coefficient estimates should be made on a
>>>>>>>> multi node cluster.
>>>>>>>> 
>>>>>>>> Does anyone have an idea how to solve that problem with the missing
>>>>>>>> local R-squared and residuals if the gwr is calculated on a cluster?
>>>>>>> 
>>>>>>> The understanding for use on a cluster was that the data points and 
>>>>>>> the fit
>>>>>>> points are different, so there is no observed dependent variable at 
>>>>>>> the fit
>>>>>>> point, hence no local R2. I've added logic in the code that checks for
>>>>>>> equality between the fit and data points, and this for me resolves the
>>>>>>> problem, but may break other things. I've committed to R-forge, 
>>>>>>> project
>>>>>>> rspatial, module spgwr. The source tarball and binary packages should 
>>>>>>> be
>>>>>>> available later this evening European time from:
>>>>>>> 
>>>>>>> https://r-forge.r-project.org/R/?group_id=1014
>>>>>>> 
>>>>>>> Could you please try it out, and report back? I should also migrate 
>>>>>>> spgwr
>>>>>>> from snow to parallel before I release it.
>>>>>>> 
>>>>>>> Best wishes,
>>>>>>> 
>>>>>>> Roger
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you very much in advance!
>>>>>>>> 
>>>>>>>> Kind regards,
>>>>>>>> 
>>>>>>>> Max
>>>>>>>> 
>>>>>>>> 
>>>>>>>> selected R-code:
>>>>>>>> 
>>>>>>>> ### gwr on local machine:
>>>>>>>> 
>>>>>>>> gwr_50 <-
>>>>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
>>>>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> # part of the  str(gwr_50) output...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> List of 11
>>>>>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] 
>>>>>>>> with 5
>>>>>>>> slots
>>>>>>>>   .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
>>>>>>>>   .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
>>>>>>>>   .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 
>>>>>>>> ...
>>>>>>>>   .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 
>>>>>>>> 0.014
>>>>>>>> ...
>>>>>>>>   .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 
>>>>>>>> -0.149
>>>>>>>> ...
>>>>>>>>   .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 
>>>>>>>> ...
>>>>>>>>   .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 
>>>>>>>> -0.00147
>>>>>>>> -0.00144 ...
>>>>>>>>   .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
>>>>>>>> 0.0406 ...
>>>>>>>>   .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 
>>>>>>>> ...
>>>>>>>>   .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ### gwr on cluster :
>>>>>>>> 
>>>>>>>> cl <- makeCluster(32, type="MPI")
>>>>>>>> 
>>>>>>>> coords <- coordinates(hef)
>>>>>>>> 
>>>>>>>> gw <-
>>>>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
>>>>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
>>>>>>>> hatmatrix=FALSE, cl=cl)
>>>>>>>> 
>>>>>>>> # part of the  str(gwr_50) output...
>>>>>>>> 
>>>>>>>> List of 11
>>>>>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] 
>>>>>>>> with 5
>>>>>>>> slots
>>>>>>>>   .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
>>>>>>>>   .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
>>>>>>>>   .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 
>>>>>>>> ...
>>>>>>>>   .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 
>>>>>>>> 0.309
>>>>>>>> ...
>>>>>>>>   .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 
>>>>>>>> 0.152
>>>>>>>> ...
>>>>>>>>   .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 
>>>>>>>> ...
>>>>>>>>   .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>
>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> Roger Bivand
>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>> e-mail: Roger.Bivand at nhh.no
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list