[R-sig-Geo] Missing local R-squared and residuals in gwr output

Maximilian Sproß Maximilian.Spross at uibk.ac.at
Thu May 10 10:50:17 CEST 2012


Dear Roger!

Your are right, there are no problems anymore. I did some some 
comparative tests with a small subset of the dataset. The number of 
values in the "pred" column, which are not finite depends on the 
bandwidth. With increasing bandwidth, the NA's disappear.

Unfortunately, I cannot compute gwr.sel due to the large data amount.

By the way, all problems are solved and the use of the cluster is a 
really nice feature to decrease processing time efficiently.

Thank you very much for your help!

Max

On 05/09/2012 01:19 PM, Roger Bivand wrote:
> On Wed, 9 May 2012, Maximilian Sproß wrote:
>
>> Thank you Roger! The gwr on the MPI cluster works fine.
>>
>> However, now the output object includes the intially missing three 
>> data slots: "gwr.e","pred" and "localR2". Unfortunately, the latter 
>> contains only NA's.
>> Sorry for of any inconvenience, but do you think you can solve that?
>
> I do not see any problem there, and indeed it is after the results 
> have been returned from the cluster. You can tell whether you have 
> been into the code block starting on line 261 in spgwr/R/gwr.R if 
> there is no line beginning with "postprocess_localR2" in the timings 
> component of the output object. The conditions are:
>
> ((!fp.given || fit_are_data) && is.null(fittedGWRobject))
>
> where the first is FALSE, the second TRUE and the third TRUE in your 
> case. If the "pred" column in your output contains values that are not 
> finite, this may happen in this code block.
>
> If you cannot see what is going on, we need a smaller test data set 
> that replicates the problem.
>
> Roger
>
>>
>> Thanks in advance and all the best,
>>
>> Max
>>
>>
>> On 05/07/2012 08:45 PM, Roger Bivand wrote:
>>> On Mon, 7 May 2012, "Sproß, Johann" wrote:
>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Mag. J. Maximilian Sproß
>>>> Institute of Geography, University of Innsbruck
>>>> Innrain 52
>>>> A-6020 INNSBRUCK
>>>>
>>>> Tel. +43 (0)512 507 5413
>>>> web: http://www.uibk.ac.at/geographie/projects/lidar/
>>>>
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
>>>> Gesendet: Mo 07.05.2012 14:48
>>>> An: Maximilian Sproß
>>>> Cc: r-sig-geo
>>>> Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in 
>>>> gwr output
>>>>
>>>> On Mon, 7 May 2012, Maximilian Sproß wrote:
>>>>
>>>>> Dear Roger!
>>>>>
>>>>> Thank you very much for your fast reply and work!
>>>>>
>>>>> I'm not really an expert in HPC-computing, but i will try to 
>>>>> report as goog
>>>>> as i can.
>>>>>
>>>>> I updated spgwr and started a job on the cluster which takes 
>>>>> normally 1,5 h.
>>>>> So far, it run for 5 hours, which indicates that the 
>>>>> parallelization does not
>>>>> work efficient anymore. The function makeCluster(64, type="MPI") 
>>>>> worked fine.
>>>>> Our cluster runs with openMPI.
>>>>
>>>> Correct. I'll try to add back an option to use snow instead of 
>>>> parallel.
>>>>
>>>> I tried out the new version but it seems still using parallel.
>>>>
>>>> code:
>>>>
>>>> gwr_50 <- 
>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
>>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
>>>> hatmatrix=FALSE, cl=cl)
>>>
>>> Add use_snow=TRUE to the command to switch to snow.
>>>
>>> Roger
>>>
>>>> Loading required package: parallel
>>>>
>>>> Attaching package: 'parallel'
>>>>
>>>> The following object(s) are masked from 'package:snow':
>>>>
>>>>    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>>>>    clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
>>>>    parCapply, parLapply, parRapply, parSapply, splitIndices,
>>>>    stopCluster
>>>>
>>>> Max
>>>>
>>>>
>>>> When it reaches R-forge, its revision number will be > 1252.
>>>>
>>>> Roger
>>>>
>>>>>
>>>>> In that context, i found on the CRAN Task view: High-Performance 
>>>>> and Parallel
>>>>> Computing with R the following:
>>>>> "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct 
>>>>> support in
>>>>> R is starting with release 2.14.0 which includes a new package 
>>>>> parallel
>>>>> incorporating (slightly revised) copies of packages multicore and 
>>>>> snow (*but
>>>>> excluding MPI, PVM and NWS clusters*). Does the new parallel 
>>>>> support works
>>>>> still in the openMPI environment?
>>>>>
>>>>> regards,
>>>>>
>>>>> Max
>>>>>
>>>>> fyi:
>>>>>
>>>>> sessionInfo()
>>>>> R version 2.14.0 (2011-10-31)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
>>>>> [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
>>>>> [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
>>>>> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel  stats     graphics  grDevices utils     datasets  
>>>>> methods
>>>>> [8] base
>>>>>
>>>>> other attached packages:
>>>>> [1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
>>>>> [5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
>>>>> [9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
>>>>> [13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
>>>>> [17] snow_0.3-8      Rmpi_0.5-9
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] grid_2.14.0
>>>>>
>>>>>
>>>>> On 05/05/2012 04:24 PM, Roger Bivand wrote:
>>>>>> On Fri, 4 May 2012, Maximilian Sproß wrote:
>>>>>>
>>>>>>> Dear r-sig-geo list!
>>>>>>>
>>>>>>> I run gwr on a multi-node cluster(on 64 slots). In the gwr 
>>>>>>> output (slot
>>>>>>> "SDF"), the gwr residuals and the local R-squared are missing. When
>>>>>>> performing the same model on the local machine, these components 
>>>>>>> are
>>>>>>> included. Unfortunately, the calculation in this way takes about 
>>>>>>> 5 days
>>>>>>> instead of few hours when using the cluster.
>>>>>>>
>>>>>>> Perhaps, that problem arises due to the argument "fit.points", 
>>>>>>> which has
>>>>>>> to be passed if the local coefficient estimates should be made on a
>>>>>>> multi node cluster.
>>>>>>>
>>>>>>> Does anyone have an idea how to solve that problem with the missing
>>>>>>> local R-squared and residuals if the gwr is calculated on a 
>>>>>>> cluster?
>>>>>>
>>>>>> The understanding for use on a cluster was that the data points 
>>>>>> and the fit
>>>>>> points are different, so there is no observed dependent variable 
>>>>>> at the fit
>>>>>> point, hence no local R2. I've added logic in the code that 
>>>>>> checks for
>>>>>> equality between the fit and data points, and this for me 
>>>>>> resolves the
>>>>>> problem, but may break other things. I've committed to R-forge, 
>>>>>> project
>>>>>> rspatial, module spgwr. The source tarball and binary packages 
>>>>>> should be
>>>>>> available later this evening European time from:
>>>>>>
>>>>>> https://r-forge.r-project.org/R/?group_id=1014
>>>>>>
>>>>>> Could you please try it out, and report back? I should also 
>>>>>> migrate spgwr
>>>>>> from snow to parallel before I release it.
>>>>>>
>>>>>> Best wishes,
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thank you very much in advance!
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Max
>>>>>>>
>>>>>>>
>>>>>>> selected R-code:
>>>>>>>
>>>>>>> ### gwr on local machine:
>>>>>>>
>>>>>>> gwr_50 <-
>>>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
>>>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss)
>>>>>>>
>>>>>>>
>>>>>>> # part of the  str(gwr_50) output...
>>>>>>>
>>>>>>>
>>>>>>> List of 11
>>>>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package 
>>>>>>> "sp"] with 5
>>>>>>> slots
>>>>>>>   .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
>>>>>>>   .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 
>>>>>>> ...
>>>>>>>   .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 
>>>>>>> -29.5 ...
>>>>>>>   .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 
>>>>>>> 0.014 0.014
>>>>>>> ...
>>>>>>>   .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 
>>>>>>> -0.148 -0.149
>>>>>>> ...
>>>>>>>   .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 
>>>>>>> -2.48 ...
>>>>>>>   .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 
>>>>>>> -0.0015 -0.00147
>>>>>>> -0.00144 ...
>>>>>>>   .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 
>>>>>>> -0.2692
>>>>>>> 0.0406 ...
>>>>>>>   .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 
>>>>>>> 0.576 ...
>>>>>>>   .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 
>>>>>>> 0.632 ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ### gwr on cluster :
>>>>>>>
>>>>>>> cl <- makeCluster(32, type="MPI")
>>>>>>>
>>>>>>> coords <- coordinates(hef)
>>>>>>>
>>>>>>> gw <-
>>>>>>> gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
>>>>>>> data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
>>>>>>> hatmatrix=FALSE, cl=cl)
>>>>>>>
>>>>>>> # part of the  str(gwr_50) output...
>>>>>>>
>>>>>>> List of 11
>>>>>>>  $ SDF      :Formal class 'SpatialPointsDataFrame' [package 
>>>>>>> "sp"] with 5
>>>>>>> slots
>>>>>>>   .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
>>>>>>>   .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
>>>>>>>   .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 
>>>>>>> -1030 ...
>>>>>>>   .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 
>>>>>>> 0.465 0.309
>>>>>>> ...
>>>>>>>   .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 
>>>>>>> 0.159 0.152
>>>>>>> ...
>>>>>>>   .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 
>>>>>>> 16.03 ...
>>>>>>>   .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> R-sig-Geo at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> Roger Bivand
>>>> Department of Economics, NHH Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>> e-mail: Roger.Bivand at nhh.no
>>>>
>>>>
>>>
>>
>>
>



More information about the R-sig-Geo mailing list