[R] generalized linear regression - function glm - dismissed predictors - more information about simulated data

Thu Nov 18 20:23:32 CET 2010

On Nov 18, 2010, at 1:31 PM, Christine SINOQUET wrote:

> Hello,
>
> a) Thanks a lot for the answer relative to the dismissed  
> coefficients. However, the result below has been obtained in the R  
> console and there was no warning (please, see the full display below  
> (***) if you wish).

I guess my memory on that expectation was faulty.

>
>
> "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 +  
> X$V10"
>               Estimate   Std. Error       t value Pr(>|t|)
> (Intercept)  0.018062134 5.624517e-17  3.211322e+14        0
> X$V1        -0.011104627 3.084989e-17 -3.599567e+14        0
> X$V2        -0.018536614 3.241635e-17 -5.718291e+14        0
> X$V3         0.107341157 4.884358e-17  2.197651e+15        0
> X$V4         0.003258493 3.286878e-17  9.913643e+13        0
> X$V6        -0.051599957 4.203840e-17 -1.227448e+15        0
> X$V8        -0.057207683 3.049835e-17 -1.875763e+15        0
> X$V10        0.134643229 3.849911e-17  3.497308e+15        0
>
> b) Would it be wise to check if the predictors are colinear, prior  
> to running the glm function ? How could it be performed, using the R  
> language?

require(Matrix)
rr <- rankMatrix(as.matrix(X))
rr < length(X)

-- 
David.

>
> In additoon, in the present case, I would like to check for  
> colinearity, because I am puzzled by the absence of warning inthe R  
> console.
>
> c) I now answer the question relative to the very low std errors and  
> the way I generated the simulated data :
>
> I ran the following code :
>
> *****************
> drawRegressionCoefficientsAndUpdateY <- function (target,
>                                                  marks,
>                                                  nbInd){
> # X (individuals x marks)
> # y (nbTargets x individuals)
>
>
> l <- length(marks)
> sigma <- runif(1,0.03,0.08)
> # Values close to 0 are excluded from c0 simulation interval so that  
> the effect of the predictors are detectable.
> c01    <- runif(1,-2,-0.5)
> c02    <- runif(1,0.5,2)
> i <- runif(1,1:2)
> if (i==1) c0 <- c01 else c0 <- c02
>
> coefficients <- c()
> for(r in 1:l){
>  coefficient <- runif(1,0.2+sigma,1+sigma)
>  coefficients <- c(coefficients, coefficient)
> }
>
> // TILL THAT POINT, I RELIED ON THE PROCESS DESCRIBED IN A THESIS.
> // THERE WAS NO INDICATION RELATIVE TO THE "ADJUSTMENT" OF EPSILON.
> epsilon <- rnorm(1, mean = 0, sd = 0.002)
> for (ind in 1:nbInd){
>  y[target,ind]<<- c0 + coefficients %*% X[ind,SNPs] + epsilon
> }
>
> Thank you in advance for your kind help.
>
> C.S.
> ------------------------------------------------
> ------------------------------------------------
>
> "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 +  
> X$V10"
>               Estimate   Std. Error       t value Pr(>|t|)
> (Intercept)  0.018062134 5.624517e-17  3.211322e+14        0
> X$V1        -0.011104627 3.084989e-17 -3.599567e+14        0
> X$V2        -0.018536614 3.241635e-17 -5.718291e+14        0
> X$V3         0.107341157 4.884358e-17  2.197651e+15        0
> X$V4         0.003258493 3.286878e-17  9.913643e+13        0
> X$V6        -0.051599957 4.203840e-17 -1.227448e+15        0
> X$V8        -0.057207683 3.049835e-17 -1.875763e+15        0
> X$V10        0.134643229 3.849911e-17  3.497308e+15        0
>
>
> I am sure to have regressed the right number of variables, since I  
> check
> that the formula is correct:
> "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 +  
> X$V10"
>
> Could somebody explain to me
> 1) why there are mismatches between the "true" coefficients for
> predictors 4 and 6
> and
> 2) why there is no information edited for predictors 5, 7 and 9 ?
>
> Thanks in advance for your kind help.
>
> C.S.
> ------------------------------------------------
> ------------------------------------------------
> (***)
> Full console display ;
>
> > inputoutputdir="/home/sinoquet/recherche/mcmc/output/ 
> sim_sat_02_10_10/"
> >  inputfilesnps="snps.txt"
> >  inputfilepheno="pheno.txt"
> >
> >  X <<-  
> read.table(file=paste(inputoutputdir,inputfilesnps,sep=""),h=F) #  
> data.frame
> >  #genotype file (individuals x SNPs); code: 0/1/2 : number of  
> minor alleles
> >
> >  y <<-  
> read.table(file=paste(inputoutputdir,inputfilepheno,sep=""),h=F) #  
> data.frame
> >
> >  #fit <- glm(yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 +  
> X$V8 + X$V9 + X$V10, family = gaussian)
> >  #coefficients( summary(fit))
> >
> >  #*****************************
> >  # BEGIN check real solution:
> >  target <- 1
> >  inputfilepredictors="predictor_descript.txt"
> >  f0 <- file(paste(inputoutputdir,inputfilepredictors,sep=""), open  
> = "r")
> >  # format of file f0:
> >  #target <predictors>
> >  #1 1 2 3 4 5 6 7 8 9 10
> >  line <- readLines(f0, n = 1) # header
> >  line <- readLines(f0, n = 1)
> >  v <- unlist(strsplit(line, split=" "))
> >  predictorsForThisTarget <- v[-1] # dismiss target number
> >  print(paste("!",v,"!",sep=" "))
> [1] "! 1 !"  "! 1 !"  "! 2 !"  "! 3 !"  "! 4 !"  "! 5 !"  "! 6 !"   
> "! 7 !"
> [9] "! 8 !"  "! 9 !"  "! 10 !"
> >
> >  nbPredictors <- length(predictorsForThisTarget)
> >  ch<-paste("X$V",predictorsForThisTarget[1],sep="")
> >  for (i in 2:nbPredictors){
> +   item<-paste("X$V",predictorsForThisTarget[i],sep="")
> +   ch<-paste(ch,item,sep=" + ")
> +  }
> >
> >  formulaString <- paste("yi",ch,sep=" ~ ")
> >  print(formulaString)
> [1] "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X 
> $V9 + X$V10"
> >  formula <- as.formula(formulaString)
> >  yi <- unlist(y[target,])
> >  fit <- glm(formula, family = gaussian)
> >  print("*************BEGIN SOLUTION****************")
> [1] "*************BEGIN SOLUTION****************"
> >  print(coefficients(summary(fit)))
>                Estimate   Std. Error       t value Pr(>|t|)
> (Intercept)  0.018062134 5.624517e-17  3.211322e+14        0
> X$V1        -0.011104627 3.084989e-17 -3.599567e+14        0
> X$V2        -0.018536614 3.241635e-17 -5.718291e+14        0
> X$V3         0.107341157 4.884358e-17  2.197651e+15        0
> X$V4         0.003258493 3.286878e-17  9.913643e+13        0
> X$V6        -0.051599957 4.203840e-17 -1.227448e+15        0
> X$V8        -0.057207683 3.049835e-17 -1.875763e+15        0
> X$V10        0.134643229 3.849911e-17  3.497308e+15        0
> >  print("*************END SOLUTION****************")
> [1] "*************END SOLUTION****************"
> >  # END check real solution:
> >  #*****************************
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT