[R]: global and local variables

allan clark allan at stats.uct.ac.za
Tue Dec 9 13:52:44 CET 2003


   Hi

   Thanx for those who responded to my problem. In my previous email I
   tried to ask a general question and probably never explained myself
   correctly.  I wanted to prevent sending this long email. My apologies.

   This is my actual problem.

   I have a regression problem. I am writing some R code in order to
   calculate some collinearity diagnostics. The diagnostics all rely on a
   function named preprocess. I've written the different diagnostics as
   separate functions so that they may be evaluated separately if
   required.

   The two functions are named mci and vif. (I will be writing some
   others later)

   mci calculates the mixed condition index as well as the condition
   indices of a given X matrix while
   vif calculates the variance inflation factors of the X matrix.

   Another function named colldiag has been written. This function will
   calculate all of the collinearity diagnostics by simply calling the
   separate functions defined previously.

   I've attached the code of the different functions as well as a data
   file (say a2) below.

   The functions mci and vif work perfectly.

   i.e.

   > mci(a2)
   [1] "DATA MATRIX CENTERED AND SCALED"
   [1] "CENTERED AND SCALED MATRIX = $data"
   [1] "MEANS OF XDATA = $means"
   [1] "STDS OF XDATA = $stds"
   [1] "THE CONDITION NUMBER AND THE CONDITION INDICES"
   $CN
   [1] 27.34412

   $CI
   [1]  1.000000  1.615690 27.344123

   $MCI
     Principal.Component Singular.Values Condition.Index
   1                   1       1.4720680        1.000000
   2                   2       0.9111078        1.615690
   3                   3       0.0538349       27.344123

   > vif(a2)
   [1] "DATA MATRIX CENTERED AND SCALED"
   [1] "CENTERED AND SCALED MATRIX = $data"
   [1] "MEANS OF XDATA = $means"
   [1] "STDS OF XDATA = $stds"
   [1] "THE VARIANCE INFLATION FACTORS"
   $vif
         x1       x2       x3
   169.3542 175.6667   1.6875

   The output from colldiag is as follows:

   > colldiag(a2)
   [1] "DATA MATRIX CENTERED AND SCALED"
   [1] "CENTERED AND SCALED MATRIX = $data"
   [1] "MEANS OF XDATA = $means"
   [1] "STDS OF XDATA = $stds"
   [1] "THE CONDITION NUMBER AND THE CONDITION INDICES"
   $CN
   [1] 27.34412

   $CI
   [1]  1.000000  1.615690 27.344123

   $MCI
     Principal.Component Singular.Values Condition.Index
   1                   1       1.4720680        1.000000
   2                   2       0.9111078        1.615690
   3                   3       0.0538349       27.344123

   [1] "DATA MATRIX CENTERED AND SCALED"
   [1] "CENTERED AND SCALED MATRIX = $data"
   [1] "MEANS OF XDATA = $means"
   [1] "STDS OF XDATA = $stds"
   [1] "THE VARIANCE INFLATION FACTORS"
   $vif
         x1       x2       x3
   169.3542 175.6667   1.6875


    

   Once you check the colldiag code below you will see that it calls mci
   and vif. In both of these functions they call preprocess. This is
   unnecessary. How can I write the code such that R only calls
   preprocess once?

   ONCE AGAIN I APOLOGIZE FOR THE LENGTH OF THIS EMAIL!!!


   Cheers
   Allan





   The data file:

      x1 x2 x3
   1  20 -4  5
   2  21 -4  4
   3  22 -3  3
   4  23 -2  2
   5  24 -1  1
   6  25  0  2
   7  26  1  3
   8  27  2  4
   9  28  3  5
   10 29  4  6
   11 20 -4  5
   12 21 -4  4
   13 22 -3  3
   14 23 -2  2
   15 24 -1  1
   16 25  0  2
   17 26  1  3
   18 27  2  4
   19 28  3  5
   20 29  4  6

   preprocess<-function (xdata,center=1,scale=1)
   {
   if(center==1 && scale==1)
   {
   means<-apply(xdata,2,mean)
   stds<-apply(xdata,2, function(x) sqrt(var(x)))
   scalefactor<-((nrow(xdata)-1)^.5)*stds
   data.centsca<-sweep(sweep(xdata,2,means,"-"),2,scalefactor,"/")
   print("DATA MATRIX CENTERED AND SCALED")
   print("CENTERED AND SCALED MATRIX = $data")
   print("MEANS OF XDATA = $means")
   print("STDS OF XDATA = $stds")
   list(data=data.centsca,means=means,stds=stds,prep=1)
   }

   else if(center==1 && scale==0)
   {
   means<-apply(xdata,2,mean)
   data.cen<-sweep(xdata,2,means,"-")
   print("DATA MATRIX CENTERED")
   list(data=data.cen,means=means,prep=1)
   }

   else if(center==0 && scale==1)
   {
   stds<-apply(xdata,2, function(x) sqrt(var(x)))
   scalefactor<-((nrow(xdata)-1)^.5)*stds
   data.sca<-sweep(xdata,2,scalefactor,"/")
   print("DATA MATRIX SCALED")
   list(data=data.sca,stds=stds,prep=1)
   }

   else
   {
   print("YOU HAVE TO SPECIFY WHETHER YOU WANT TO SCALE OR CENTER THE
   MATRIX")
   print("THE preprocess FUNCTION HAS THREE ARGUMENTS. i.e.
   preprocess(xdata,center,scale)")
   print("xdata IS THE MATRIX TO BE TRANSFORMED")
   print("TO CENTER SPECIFY center=1")
   print("TO SCALE SPECIFY scale=1")
   }

   # A matrix is standardised as follows:
   # X*(i,j) = ( X(i,j)- XBAR(j) ) / (   sqrt(n-1)* STD(j)   )

   }

   mci<-function (xdata)
   {
   a<-preprocess(xdata)
   b<-svd(a$data)
   cn<-(b$d)[1]/(b$d)[ncol(a$data)]
   ci<-(b$d)[1]/(b$d)[1:ncol(a$data)]

   #paste("THE CONDITION NUMBER = ",cn)

   #the following produces a table in order to output the mci values
   Principal.Component<-1:ncol(a$data)
   Singular.Values<-b$d
   Condition.Index<-ci
   mcitable<-data.frame(Principal.Component,Singular.Values,Condition.Ind
   ex)

   print("THE CONDITION NUMBER AND THE CONDITION INDICES")
   d<-list(CN=cn,CI=ci,MCI=mcitable)
   print(d)
   }

   vif<-function (xdata)
   {
   a<-preprocess(xdata)
   vif<-diag(solve(cor(a$data)))

   print("THE VARIANCE INFLATION FACTORS")
   b<-list(vif=vif)
   b
   }

   colldiag<-function (xdata)
   {
   mci(xdata)
   vif(xdata)
   }



More information about the R-help mailing list