[R] Trust p-values or not ?

John Sorkin jsorkin at grecc.umaryland.edu
Sun Oct 7 21:35:33 CEST 2007


(1) Not necessarily. Linear regression has a number of assumptions. I suggest you get a basics statistics
textbook and do some reading. A brief summary of the assumptions include:
(a) The relation between outcome and predictor variables lie along a line (or plane for a regression with
multiple predictor variables) or some surface that can be modeled using a linear function
(b) The predictor variables are independent of one another
(c) The residuals from the regression are normally distributed
(d) The variance of the residuals is constant through out the range of the independent variables.
(e) The predictor variables are measured without error.

Even if the above assumptions are violated, you can still get a significant f statistic, significance for some,
or all of your predictor variables, etc. If the assumptions are violated, the meaning of the results you 
obtain from your regression analysis can be questionable, if not outrightly incorrect. There a number of
tests that you can perform to make sure you model conforms to (or at least does not wildly violate) the basic
assumptions. Some commonly performed tests like examining the pattern of residuals, can be done in R
by simply plotting the fit you obtain, i.e.  

plot(fit1)             #This produces a number of helpful graphs that will  help you evaluate your model.

Fortunately, linear regression is fairly robust to minor violations of several of the assumptions noted above, 
however in order to fully evaluate the appropriateness of you model, you will need to read a textbook, speak
to people with more experience than you, and play, play, play with data.

(2) The more predictor variables you have the more observations you need. Although there is no absolute
rule, many people like to have a minimum of five to ten observations per independent variable. I like to have
at least ten. Given that you have eight independent variables, you would, by my criteria need at least 
80 observations. You have 130 so you should be OK, assuming that your observations are independent of

Sorry I can't be of more help; statistics can not be learned in a single E-mail message. The fact that you
are asking important questions about what you are doing reflects well on you. I suspect that in a year or
so you will be answering, rather than asking questions posted on the R Listserv mailing list!


John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
jsorkin at grecc.umaryland.edu
>>> "rocker turtle" <rockingturtle at gmail.com> 10/07/07 2:32 PM >>>

First of all kudos to the creaters/contributors to R ! This is a great
package and I am finding it very useful in my research, will love to
contribute any modules and dataset which I develop to the project.

While doing multiple regression I arrived at the following peculiar
Out of 8 variables only 4 have  <0.04 p-values (of t-statistic), rest all
have p-values between 0.1 and 1.0 and the coeff of Regression is coming
around ~0.8 (adjusted ~0.78). The F-statistic is
around 30 and its own p-value is ~0. Also I am constrained with a dataset of
130 datapoints.

Being new to statistics I would really appreciate if someone can help me
understand these values.
1) Does the above test values indicate a statistically sound and significant
model ?
2) Is a dataset of 130 enough to run linear regression with ~7-10 variables
? If not what is approximately a good size.

Thanks in advance.

    [[alternative HTML version deleted]]

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

More information about the R-help mailing list