[R] generalized linear regression - function glm - dismissed predictors - more information about simulated data
David Winsemius
dwinsemius at comcast.net
Thu Nov 18 20:23:32 CET 2010
On Nov 18, 2010, at 1:31 PM, Christine SINOQUET wrote:
> Hello,
>
> a) Thanks a lot for the answer relative to the dismissed
> coefficients. However, the result below has been obtained in the R
> console and there was no warning (please, see the full display below
> (***) if you wish).
I guess my memory on that expectation was faulty.
>
>
> "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 +
> X$V10"
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.018062134 5.624517e-17 3.211322e+14 0
> X$V1 -0.011104627 3.084989e-17 -3.599567e+14 0
> X$V2 -0.018536614 3.241635e-17 -5.718291e+14 0
> X$V3 0.107341157 4.884358e-17 2.197651e+15 0
> X$V4 0.003258493 3.286878e-17 9.913643e+13 0
> X$V6 -0.051599957 4.203840e-17 -1.227448e+15 0
> X$V8 -0.057207683 3.049835e-17 -1.875763e+15 0
> X$V10 0.134643229 3.849911e-17 3.497308e+15 0
>
> b) Would it be wise to check if the predictors are colinear, prior
> to running the glm function ? How could it be performed, using the R
> language?
require(Matrix)
rr <- rankMatrix(as.matrix(X))
rr < length(X)
--
David.
>
> In additoon, in the present case, I would like to check for
> colinearity, because I am puzzled by the absence of warning inthe R
> console.
>
> c) I now answer the question relative to the very low std errors and
> the way I generated the simulated data :
>
> I ran the following code :
>
> *****************
> drawRegressionCoefficientsAndUpdateY <- function (target,
> marks,
> nbInd){
> # X (individuals x marks)
> # y (nbTargets x individuals)
>
>
> l <- length(marks)
> sigma <- runif(1,0.03,0.08)
> # Values close to 0 are excluded from c0 simulation interval so that
> the effect of the predictors are detectable.
> c01 <- runif(1,-2,-0.5)
> c02 <- runif(1,0.5,2)
> i <- runif(1,1:2)
> if (i==1) c0 <- c01 else c0 <- c02
>
> coefficients <- c()
> for(r in 1:l){
> coefficient <- runif(1,0.2+sigma,1+sigma)
> coefficients <- c(coefficients, coefficient)
> }
>
> // TILL THAT POINT, I RELIED ON THE PROCESS DESCRIBED IN A THESIS.
> // THERE WAS NO INDICATION RELATIVE TO THE "ADJUSTMENT" OF EPSILON.
> epsilon <- rnorm(1, mean = 0, sd = 0.002)
> for (ind in 1:nbInd){
> y[target,ind]<<- c0 + coefficients %*% X[ind,SNPs] + epsilon
> }
>
> Thank you in advance for your kind help.
>
> C.S.
> ------------------------------------------------
> ------------------------------------------------
>
> "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 +
> X$V10"
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.018062134 5.624517e-17 3.211322e+14 0
> X$V1 -0.011104627 3.084989e-17 -3.599567e+14 0
> X$V2 -0.018536614 3.241635e-17 -5.718291e+14 0
> X$V3 0.107341157 4.884358e-17 2.197651e+15 0
> X$V4 0.003258493 3.286878e-17 9.913643e+13 0
> X$V6 -0.051599957 4.203840e-17 -1.227448e+15 0
> X$V8 -0.057207683 3.049835e-17 -1.875763e+15 0
> X$V10 0.134643229 3.849911e-17 3.497308e+15 0
>
>
> I am sure to have regressed the right number of variables, since I
> check
> that the formula is correct:
> "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X$V9 +
> X$V10"
>
> Could somebody explain to me
> 1) why there are mismatches between the "true" coefficients for
> predictors 4 and 6
> and
> 2) why there is no information edited for predictors 5, 7 and 9 ?
>
> Thanks in advance for your kind help.
>
> C.S.
> ------------------------------------------------
> ------------------------------------------------
> (***)
> Full console display ;
>
> > inputoutputdir="/home/sinoquet/recherche/mcmc/output/
> sim_sat_02_10_10/"
> > inputfilesnps="snps.txt"
> > inputfilepheno="pheno.txt"
> >
> > X <<-
> read.table(file=paste(inputoutputdir,inputfilesnps,sep=""),h=F) #
> data.frame
> > #genotype file (individuals x SNPs); code: 0/1/2 : number of
> minor alleles
> >
> > y <<-
> read.table(file=paste(inputoutputdir,inputfilepheno,sep=""),h=F) #
> data.frame
> >
> > #fit <- glm(yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 +
> X$V8 + X$V9 + X$V10, family = gaussian)
> > #coefficients( summary(fit))
> >
> > #*****************************
> > # BEGIN check real solution:
> > target <- 1
> > inputfilepredictors="predictor_descript.txt"
> > f0 <- file(paste(inputoutputdir,inputfilepredictors,sep=""), open
> = "r")
> > # format of file f0:
> > #target <predictors>
> > #1 1 2 3 4 5 6 7 8 9 10
> > line <- readLines(f0, n = 1) # header
> > line <- readLines(f0, n = 1)
> > v <- unlist(strsplit(line, split=" "))
> > predictorsForThisTarget <- v[-1] # dismiss target number
> > print(paste("!",v,"!",sep=" "))
> [1] "! 1 !" "! 1 !" "! 2 !" "! 3 !" "! 4 !" "! 5 !" "! 6 !"
> "! 7 !"
> [9] "! 8 !" "! 9 !" "! 10 !"
> >
> > nbPredictors <- length(predictorsForThisTarget)
> > ch<-paste("X$V",predictorsForThisTarget[1],sep="")
> > for (i in 2:nbPredictors){
> + item<-paste("X$V",predictorsForThisTarget[i],sep="")
> + ch<-paste(ch,item,sep=" + ")
> + }
> >
> > formulaString <- paste("yi",ch,sep=" ~ ")
> > print(formulaString)
> [1] "yi ~ X$V1 + X$V2 + X$V3 + X$V4 + X$V5 + X$V6 + X$V7 + X$V8 + X
> $V9 + X$V10"
> > formula <- as.formula(formulaString)
> > yi <- unlist(y[target,])
> > fit <- glm(formula, family = gaussian)
> > print("*************BEGIN SOLUTION****************")
> [1] "*************BEGIN SOLUTION****************"
> > print(coefficients(summary(fit)))
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.018062134 5.624517e-17 3.211322e+14 0
> X$V1 -0.011104627 3.084989e-17 -3.599567e+14 0
> X$V2 -0.018536614 3.241635e-17 -5.718291e+14 0
> X$V3 0.107341157 4.884358e-17 2.197651e+15 0
> X$V4 0.003258493 3.286878e-17 9.913643e+13 0
> X$V6 -0.051599957 4.203840e-17 -1.227448e+15 0
> X$V8 -0.057207683 3.049835e-17 -1.875763e+15 0
> X$V10 0.134643229 3.849911e-17 3.497308e+15 0
> > print("*************END SOLUTION****************")
> [1] "*************END SOLUTION****************"
> > # END check real solution:
> > #*****************************
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list