[R] generalized linear regression - function glm - dismissed predictors - more information about simulated data

Christine SINOQUET christine.sinoquet at univ-nantes.fr
Thu Nov 18 19:31:41 CET 2010


Hello,

a) Thanks a lot for the answer relative to the dismissed coefficients. 
However, the result below has been obtained in the R console and there 
was no warning (please, see the full display below (***) if you wish).


"yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 + X$V10"
                Estimate   Std. Error       t value Pr(>|t|)
(Intercept)  0.018062134 5.624517e-17  3.211322e+14        0
X$V1        -0.011104627 3.084989e-17 -3.599567e+14        0
X$V2        -0.018536614 3.241635e-17 -5.718291e+14        0
X$V3         0.107341157 4.884358e-17  2.197651e+15        0
X$V4         0.003258493 3.286878e-17  9.913643e+13        0
X$V6        -0.051599957 4.203840e-17 -1.227448e+15        0
X$V8        -0.057207683 3.049835e-17 -1.875763e+15        0
X$V10        0.134643229 3.849911e-17  3.497308e+15        0

b) Would it be wise to check if the predictors are colinear, prior to 
running the glm function ? How could it be performed, using the R language?

In additoon, in the present case, I would like to check for colinearity, 
because I am puzzled by the absence of warning inthe R console.

c) I now answer the question relative to the very low std errors and the 
way I generated the simulated data :

I ran the following code :

*****************
drawRegressionCoefficientsAndUpdateY <- function (target,
                                                   marks,
                                                   nbInd){
# X (individuals x marks)
# y (nbTargets x individuals)


  l <- length(marks)
  sigma <- runif(1,0.03,0.08)
  # Values close to 0 are excluded from c0 simulation interval so that 
the effect of the predictors are detectable.
  c01    <- runif(1,-2,-0.5)
  c02    <- runif(1,0.5,2)
  i <- runif(1,1:2)
  if (i==1) c0 <- c01 else c0 <- c02

  coefficients <- c()
  for(r in 1:l){
   coefficient <- runif(1,0.2+sigma,1+sigma)
   coefficients <- c(coefficients, coefficient)
  }

  // TILL THAT POINT, I RELIED ON THE PROCESS DESCRIBED IN A THESIS.
  // THERE WAS NO INDICATION RELATIVE TO THE "ADJUSTMENT" OF EPSILON.
  epsilon <- rnorm(1, mean = 0, sd = 0.002)
  for (ind in 1:nbInd){
   y[target,ind]<<- c0 + coefficients %*% X[ind,SNPs] + epsilon
  }

Thank you in advance for your kind help.

C.S.
------------------------------------------------
------------------------------------------------

"yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 + X$V10"
                Estimate   Std. Error       t value Pr(>|t|)
(Intercept)  0.018062134 5.624517e-17  3.211322e+14        0
X$V1        -0.011104627 3.084989e-17 -3.599567e+14        0
X$V2        -0.018536614 3.241635e-17 -5.718291e+14        0
X$V3         0.107341157 4.884358e-17  2.197651e+15        0
X$V4         0.003258493 3.286878e-17  9.913643e+13        0
X$V6        -0.051599957 4.203840e-17 -1.227448e+15        0
X$V8        -0.057207683 3.049835e-17 -1.875763e+15        0
X$V10        0.134643229 3.849911e-17  3.497308e+15        0


I am sure to have regressed the right number of variables, since I check
that the formula is correct:
"yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 + X$V10"

Could somebody explain to me
1) why there are mismatches between the "true" coefficients for
predictors 4 and 6
and
2) why there is no information edited for predictors 5, 7 and 9 ?

Thanks in advance for your kind help.

C.S.
------------------------------------------------
------------------------------------------------
(***)
Full console display ;

 > inputoutputdir="/home/sinoquet/recherche/mcmc/output/sim_sat_02_10_10/"
 >  inputfilesnps="snps.txt"
 >  inputfilepheno="pheno.txt"
 >
 >  X <<- 
read.table(file=paste(inputoutputdir,inputfilesnps,sep=""),h=F) # data.frame
 >  #genotype file (individuals x SNPs); code: 0/1/2 : number of minor 
alleles
 >
 >  y <<- 
read.table(file=paste(inputoutputdir,inputfilepheno,sep=""),h=F) # 
data.frame
 >
 >  #fit <- glm(yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + 
X$V8 + X$V9 + X$V10, family = gaussian)
 >  #coefficients( summary(fit))
 >
 >  #*****************************
 >  # BEGIN check real solution:
 >  target <- 1
 >  inputfilepredictors="predictor_descript.txt"
 >  f0 <- file(paste(inputoutputdir,inputfilepredictors,sep=""), open = 
"r")
 >  # format of file f0:
 >  #target <predictors>
 >  #1 1 2 3 4 5 6 7 8 9 10
 >  line <- readLines(f0, n = 1) # header
 >  line <- readLines(f0, n = 1)
 >  v <- unlist(strsplit(line, split=" "))
 >  predictorsForThisTarget <- v[-1] # dismiss target number
 >  print(paste("!",v,"!",sep=" "))
  [1] "! 1 !"  "! 1 !"  "! 2 !"  "! 3 !"  "! 4 !"  "! 5 !"  "! 6 !"  "! 
7 !"
  [9] "! 8 !"  "! 9 !"  "! 10 !"
 >
 >  nbPredictors <- length(predictorsForThisTarget)
 >  ch<-paste("X$V",predictorsForThisTarget[1],sep="")
 >  for (i in 2:nbPredictors){
+   item<-paste("X$V",predictorsForThisTarget[i],sep="")
+   ch<-paste(ch,item,sep=" + ")
+  }
 >
 >  formulaString <- paste("yi",ch,sep=" ~ ")
 >  print(formulaString)
[1] "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 + 
X$V10"
 >  formula <- as.formula(formulaString)
 >  yi <- unlist(y[target,])
 >  fit <- glm(formula, family = gaussian)
 >  print("*************BEGIN SOLUTION****************")
[1] "*************BEGIN SOLUTION****************"
 >  print(coefficients(summary(fit)))
                 Estimate   Std. Error       t value Pr(>|t|)
(Intercept)  0.018062134 5.624517e-17  3.211322e+14        0
X$V1        -0.011104627 3.084989e-17 -3.599567e+14        0
X$V2        -0.018536614 3.241635e-17 -5.718291e+14        0
X$V3         0.107341157 4.884358e-17  2.197651e+15        0
X$V4         0.003258493 3.286878e-17  9.913643e+13        0
X$V6        -0.051599957 4.203840e-17 -1.227448e+15        0
X$V8        -0.057207683 3.049835e-17 -1.875763e+15        0
X$V10        0.134643229 3.849911e-17  3.497308e+15        0
 >  print("*************END SOLUTION****************")
[1] "*************END SOLUTION****************"
 >  # END check real solution:
 >  #*****************************



More information about the R-help mailing list