[R] Testing for strength of fit using R

Steve Murray smurray444 at hotmail.com
Thu Nov 26 15:48:14 CET 2009


Dear all,

I am trying to validate a model by comparing simulated output values against observed values. I have produced a simple X-y scatter plot with a 1:1 line, so that the closer the points fall to this line, the better the 'fit' between the modelled data and the observation data.

I am now attempting to quantify the strength of this fit by using a statistical test in R. I am no statistics guru, but from my limited understanding, I suspect that I need to use the Chi Squared test (I am more than happy to be corrected on this though!).

However, this results in the following:


> chisq.test(data$Simulation,data$Observation)

    Pearson's Chi-squared test

data:  data$Simulation and data$Observation 
X-squared = 567, df = 550, p-value = 0.2989

Warning message:
In chisq.test(data$Simulation, data$Observation) :
  Chi-squared approximation may be incorrect


The ?chisq.test document suggests that the objects should be of vector or matrix format, so I tried the following, but still receive a warning message (and different results):

> chisq.test(as.matrix(data[,4:5]))

    Pearson's Chi-squared test

data:  as.matrix(data[, 4:5]) 
X-squared = 130.8284, df = 26, p-value = 6.095e-16

Warning message:
In chisq.test(as.matrix(data[, 4:5])) :
  Chi-squared approximation may be incorrect



What am I doing wrong and how can I successfully measure how well the simulated values fit the observed values?


If it's of any help, here are how my data are structured - note that I am only using columns 4 and 5 (Observation and Simulation).

> str(data)
'data.frame':    27 obs. of  5 variables:
 $ Location        : Factor w/ 27 levels "Australia","Brazil",..: 8 2 13 19 22 14 16 23 6 7 ...
 $ Vegetation      : Factor w/ 21 levels "Beech","Broadleaf evergreen laurel",..: 17 21 2 16 15 16 9 16 3 4 ...
 $ Vegetation.Class: Factor w/ 4 levels "Boreal and Temperate Evergreen",..: 3 3 4 1 1 1 4 1 4 1 ...
 $ Observation     : num  24 8.9 14.7 26.7 42.4 31.7 30.8 7.5 14 22 ...
 $ Simulation      : num  33.9 7.8 9.74 7.6 11.8 10.7 12 28.1 1.7 1.7 ...


I hope someone is able to point me in the right direction.

Many thanks,

Steve



 		 	   		  
_________________________________________________________________
Have more than one Hotmail account? Link them together to easily access both
 http://clk.atdmt.com/UKM/go/186394591/direct/01/



More information about the R-help mailing list