[R] question related to multiple regression

Ben Bolker bbolker at gmail.com
Mon Oct 11 15:05:50 CEST 2010


SNN <s.nancy1 <at> yahoo.com> writes:

> I am conducting an association analysis of genotype and a phenotype such as
> cholesterol level as an outcome and the genotype as a regressor using
> multiple linear regression. There are 3 possibilities for the genotype AA,
> AG, GG. There are 5 people with the AA genotype, 100 with the AG genotype
> and 900 with the GG genotype. I coded GG genotype as 1, AG as 2 and AA as 3
> and the p-value for the genotype is significant. 
> Should I believe this p-value or not? My concern is that there are not may
> samples with the AA genotype and could that have effected the significance
> of the genotype in the model?

  Make sure that R is treating genotype as a factor, not a continuous
covariate -- for that reason it's better *not* to recode genotypes
as integer codes, which increases the chance of this type of confusion.
Unless you really have reason to believe that the difference in
expected cholesterol level is linearly related to the number of
"A" alleles -- i.e. 

b_0 for GG
b_0+d for AG
b_0+2*d for AA

this seems like a fairly strong assumption to make ...



More information about the R-help mailing list