[R]: global and local variables
allan clark
allan at stats.uct.ac.za
Tue Dec 9 13:52:44 CET 2003
Hi
Thanx for those who responded to my problem. In my previous email I
tried to ask a general question and probably never explained myself
correctly. I wanted to prevent sending this long email. My apologies.
This is my actual problem.
I have a regression problem. I am writing some R code in order to
calculate some collinearity diagnostics. The diagnostics all rely on a
function named preprocess. I've written the different diagnostics as
separate functions so that they may be evaluated separately if
required.
The two functions are named mci and vif. (I will be writing some
others later)
mci calculates the mixed condition index as well as the condition
indices of a given X matrix while
vif calculates the variance inflation factors of the X matrix.
Another function named colldiag has been written. This function will
calculate all of the collinearity diagnostics by simply calling the
separate functions defined previously.
I've attached the code of the different functions as well as a data
file (say a2) below.
The functions mci and vif work perfectly.
i.e.
> mci(a2)
[1] "DATA MATRIX CENTERED AND SCALED"
[1] "CENTERED AND SCALED MATRIX = $data"
[1] "MEANS OF XDATA = $means"
[1] "STDS OF XDATA = $stds"
[1] "THE CONDITION NUMBER AND THE CONDITION INDICES"
$CN
[1] 27.34412
$CI
[1] 1.000000 1.615690 27.344123
$MCI
Principal.Component Singular.Values Condition.Index
1 1 1.4720680 1.000000
2 2 0.9111078 1.615690
3 3 0.0538349 27.344123
> vif(a2)
[1] "DATA MATRIX CENTERED AND SCALED"
[1] "CENTERED AND SCALED MATRIX = $data"
[1] "MEANS OF XDATA = $means"
[1] "STDS OF XDATA = $stds"
[1] "THE VARIANCE INFLATION FACTORS"
$vif
x1 x2 x3
169.3542 175.6667 1.6875
The output from colldiag is as follows:
> colldiag(a2)
[1] "DATA MATRIX CENTERED AND SCALED"
[1] "CENTERED AND SCALED MATRIX = $data"
[1] "MEANS OF XDATA = $means"
[1] "STDS OF XDATA = $stds"
[1] "THE CONDITION NUMBER AND THE CONDITION INDICES"
$CN
[1] 27.34412
$CI
[1] 1.000000 1.615690 27.344123
$MCI
Principal.Component Singular.Values Condition.Index
1 1 1.4720680 1.000000
2 2 0.9111078 1.615690
3 3 0.0538349 27.344123
[1] "DATA MATRIX CENTERED AND SCALED"
[1] "CENTERED AND SCALED MATRIX = $data"
[1] "MEANS OF XDATA = $means"
[1] "STDS OF XDATA = $stds"
[1] "THE VARIANCE INFLATION FACTORS"
$vif
x1 x2 x3
169.3542 175.6667 1.6875
Once you check the colldiag code below you will see that it calls mci
and vif. In both of these functions they call preprocess. This is
unnecessary. How can I write the code such that R only calls
preprocess once?
ONCE AGAIN I APOLOGIZE FOR THE LENGTH OF THIS EMAIL!!!
Cheers
Allan
The data file:
x1 x2 x3
1 20 -4 5
2 21 -4 4
3 22 -3 3
4 23 -2 2
5 24 -1 1
6 25 0 2
7 26 1 3
8 27 2 4
9 28 3 5
10 29 4 6
11 20 -4 5
12 21 -4 4
13 22 -3 3
14 23 -2 2
15 24 -1 1
16 25 0 2
17 26 1 3
18 27 2 4
19 28 3 5
20 29 4 6
preprocess<-function (xdata,center=1,scale=1)
{
if(center==1 && scale==1)
{
means<-apply(xdata,2,mean)
stds<-apply(xdata,2, function(x) sqrt(var(x)))
scalefactor<-((nrow(xdata)-1)^.5)*stds
data.centsca<-sweep(sweep(xdata,2,means,"-"),2,scalefactor,"/")
print("DATA MATRIX CENTERED AND SCALED")
print("CENTERED AND SCALED MATRIX = $data")
print("MEANS OF XDATA = $means")
print("STDS OF XDATA = $stds")
list(data=data.centsca,means=means,stds=stds,prep=1)
}
else if(center==1 && scale==0)
{
means<-apply(xdata,2,mean)
data.cen<-sweep(xdata,2,means,"-")
print("DATA MATRIX CENTERED")
list(data=data.cen,means=means,prep=1)
}
else if(center==0 && scale==1)
{
stds<-apply(xdata,2, function(x) sqrt(var(x)))
scalefactor<-((nrow(xdata)-1)^.5)*stds
data.sca<-sweep(xdata,2,scalefactor,"/")
print("DATA MATRIX SCALED")
list(data=data.sca,stds=stds,prep=1)
}
else
{
print("YOU HAVE TO SPECIFY WHETHER YOU WANT TO SCALE OR CENTER THE
MATRIX")
print("THE preprocess FUNCTION HAS THREE ARGUMENTS. i.e.
preprocess(xdata,center,scale)")
print("xdata IS THE MATRIX TO BE TRANSFORMED")
print("TO CENTER SPECIFY center=1")
print("TO SCALE SPECIFY scale=1")
}
# A matrix is standardised as follows:
# X*(i,j) = ( X(i,j)- XBAR(j) ) / ( sqrt(n-1)* STD(j) )
}
mci<-function (xdata)
{
a<-preprocess(xdata)
b<-svd(a$data)
cn<-(b$d)[1]/(b$d)[ncol(a$data)]
ci<-(b$d)[1]/(b$d)[1:ncol(a$data)]
#paste("THE CONDITION NUMBER = ",cn)
#the following produces a table in order to output the mci values
Principal.Component<-1:ncol(a$data)
Singular.Values<-b$d
Condition.Index<-ci
mcitable<-data.frame(Principal.Component,Singular.Values,Condition.Ind
ex)
print("THE CONDITION NUMBER AND THE CONDITION INDICES")
d<-list(CN=cn,CI=ci,MCI=mcitable)
print(d)
}
vif<-function (xdata)
{
a<-preprocess(xdata)
vif<-diag(solve(cor(a$data)))
print("THE VARIANCE INFLATION FACTORS")
b<-list(vif=vif)
b
}
colldiag<-function (xdata)
{
mci(xdata)
vif(xdata)
}
More information about the R-help
mailing list