[R-sig-Geo] Logit spatial predictions outside range [0,1]

Marcelino de la Cruz marcelino.delacruz at upm.es
Mon Jul 21 14:53:57 CEST 2014


Hi Edoardo,

This happens because, as the help for predict.glm() says:

'The default [prediction]  is on the scale of the linear predictors; the 
alternative "response" is on the scale of the response variable.'

I.e., you should use

pred = predict(predictors, logit, ext=ext, type ="response")

to get your predictions on the [0,1] scale.

Hope this helps


Marcelino



El 21/07/2014 12:27, Edoardo Baldoni escribió:
> Hello R-sig-Geo,
>
>
> my name is Edoardo and this is the first time I am asking for help here.
>
> I am going through the "sdm" vignette of package "dismo" and, when using
> the Logistic regression model of presence/absence(background) data of
> bradypus (p. 53) for spatial prediction, I get values outside the range
> [0,1]. I think this should not happen as using the Logit cumulative
> distribution function should limit them in the [0,1] range.
>
> It follows the code with the data:
>
> ## This is the dataset I use. It is just a random sample of the original
> envtrain dataset of
> ## the "sdm" vignette
>
>> envtrain0
>      pa bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8 biome
> 735  0  265  2618   954   391  314  224   90  264     1
> 634  0  264  2590  1007   203  327  200  127  263     1
> 861  0  243  1783   916    29  335  132  202  247     1
> 70   1  261  2791  1077   148  310  218   92  254     1
> 48   1  257  3575  1295   359  316  202  114  258     1
> 511  0  220  1583   457   281  339  114  225  226     7
> 344  0  230  3352  1436   247  294  167  127  228     1
> 453  0  242   240   140    14  308  186  121  260    13
> 325  0  240  1541   447   302  291  186  104  249     1
> 193  0  232  1130   506    87  311  123  188  256     1
> 568  0  257  2269   774   246  317  188  128  259     1
> 664  0  190  1512   544   251  282   85  196  219     1
> 82   1  268  2403   723   407  327  207  120  268     1
> 271  0  180  1078   336   195  320   65  255  238     7
> 873  0  258  4238  1378   814  301  222   79  260     1
> 105  0   92   719   290    98  150   36  114   92     1
> 461  0  271  2112   849   254  330  224  106  268     1
> 37   1  276   372   215    25  330  222  108  278    13
> 822  0  110  1427   707   115  232   31  201   79     4
> 59   1  252  2471  1094   172  319  194  124  254     1
> 26   1  199  2005   657   256  262  143  119  196     1
> 115  0  215  1247   518   178  329   79  251  271     5
> 714  0  257  1903   714   162  325  182  143  260     1
> 781  0  265  2048   909   172  329  212  117  259     1
> 84   1  263  2921   868   607  318  205  113  262     1
> 604  0  119  1251   595   117  237   41  196   87     4
> 449  0  243  1307   540    86  310  148  162  259     7
> 384  0  233  2257   955   237  300  159  140  241     1
> 187  0  125  1282   616   115  230   50  180  101     4
> 278  0  254   819   521    12  314  199  115  255    13
> 766  0  178  1220   333   269  316   64  251  201     7
> 426  0  255  2207   984    52  350  159  191  253     1
> 153  0  253  1874   775   101  320  172  148  256     1
> 174  0  220  1410   492   168  304  113  191  241     7
> 214  0  238  1667   722   101  320  159  161  245     1
> 626  0  258  2116   959   203  333  206  127  252     7
> 779  0  224  1834   907    28  316  107  210  235     7
> 429  0  247   569   226    58  315  180  136  237    13
> 580  0  261  2507  1085    94  346  171  175  258     1
> 388  0  252  2408   886   278  319  181  138  255     1
>
> ## I then estimate a Logit model and use it for prediction
> ##  with new data (predictors)
>
> logit = glm(pa ~ bio1 + bio5 + bio6 + bio7 + bio8 + bio12 + bio16 + bio17,
> family = binomial(link = 'logit'), data = envtrain0)
> files = list.files(paste(system.file(package = 'dismo'),'/ex',sep=''),
> pattern = 'grd', full.names = T)
> predictors = stack(files)
> ext = extent(-90,-32,-33,23)
> pred = predict(predictors, logit, ext=ext)
> pred
>
> ## The output says that predicted values are in the range (-22.41863,
> 5.837521)
>
>> pred
> class       : RasterLayer
> dimensions  : 112, 116, 12992  (nrow, ncol, ncell)
> resolution  : 0.5, 0.5  (x, y)
> extent      : -90, -32, -33, 23  (xmin, xmax, ymin, ymax)
> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
> data source : in memory
> names       : layer
> values      : -22.41863, 5.837521  (min, max)
>
> I was expecting to get values included in the range [0,1] as the logit
> model should do. Indeed,
> the fitted values of the model all lie between 0 and 1.
>
> Why predicted values are in the range (-22.41863, 5.837521) here ?
> Can you tell me where I am making a mistake ?
> Thanks
>
> Regards,
>
>
> Edoardo
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list