[R-sig-ME] P value value for a large number of degree of freedom in lmer

Jonathan Baron baron at psych.upenn.edu
Thu Nov 25 01:10:35 CET 2010

I am not going to belabor this point anymore - hence I don't plan to
reply to further comments, although I will read them - but the idea of
a well-controlled experiment is an idealization, and sometimes we come
very very close to achieving it.

My PhD thesis had a psychophysical experiment in which each of five
subjects - I was one of them - made judgments for 10 1-hour sessions
about which of three events came first, and then a second guess.  The
main question was whether signal-detection theory could explain the
second-guess results.  That was the null hypothesis, and it was
rejected, for each subject.  Each subject made several thousand
judgments.  The experiment was designed so that the null hypothesis
would be true if the theory behind it were true.  This is typical of
experiments in psychophysics.

I have done many other experiments that I thought were well
controlled, but never with so many observations.

The more general point is that I think there is a distinction between
the logical structure of experiments and the structure of
observational studies.  Null hypothesis testing is almost always
appropriate for the former and almost never appropriate for the
latter, except as a short-hand for descriptive statistics.

I think that I am unusual among psychologists only for admitting the
inappropriateness of null-hypothesis testing for observational


On 11/24/10 13:25, Rolf Turner wrote:
> On 24/11/2010, at 1:09 PM, Jonathan Baron wrote:
> > For the record, I have to register my disagreement.  In the
> > experimental sciences, the name of the game is to design a
> > well-controlled experiment, which means that the null hypothesis will
> > be true if the alternative hypothesis is false.  People who say what
> > is below, which includes almost everyone who responded to this post,
> > have something else in mind.  What they say is true in most
> > disciplines.  But when I hear this sort of thing, it is like someone
> > is telling me that my research career as an EXPERIMENTAL psychologist
> > has been some sort of delusion.
> > 
> > If you have a very large sample and you are doing a correlational
> > study, yes, everything will be significant.  But if you do the kind of
> > experiment we struggle to design, with perfect control conditions, you
> > won't get significant results (except by chance) if your hypothesis is
> > wrong.
> > 
> 	I'll bet you don't work with samples of size 200,000. :-)
> 	Also I'll bet that you don't ***really*** care if the
> 	difference between mu_T and mu_C is bigger than 0.000001 mm,
> 	say, whereas you might care if the difference were bigger than
> 	10 mm.
> 	Also there's no such thing as ``perfect'' anything, let alone
> 	control conditions.
> 		cheers,
> 			Rolf Turner
> > Jon
> > 
> > On 11/24/10 07:59, Rolf Turner wrote:
> >> 
> >> It is well known amongst statisticians that having a large enough data set will
> >> result in the rejection of *any* null hypothesis, i.e. will result in a small
> >> p-value.  There is no ``bias'' involved.

On 11/24/10 15:13, John Maindonald wrote:
> I need to redraft the final sentence of the first paragraph,
> to read: "The consequence is that effects that are well within
> the bounds of statistical variation may, according to the
> the usual rituals, appear statistically significant, "
> ----------------------------------------------------------------------------
> There are other considerations, which may often be more
> serious.  In any observational dataset, there is almost
> bound to be structure.  This arises in different areas in 
> different ways, but some of the possibilities are:
> 1) a time element
> 2) a space element
> 3) a location or culture or group or family element
> 4) an effect from collection instrument or person.
> So the correlation structure is not iid or even i, something
> we might be expected to know about on this list.  The
> correlations will often be positive.  Even after multi-level
> or spatial models have been used to take out what is
> thought to be the structure, there will often be structure 
> left.  The consequence is that effects that are well within
> the bounds of statistical variation may, according to the
> the usual rituals, appear statistically significant, 
> There are other problems.  Some variables may be measured
> very inaccurately.  Used on their own, this reduces the chances
> of finding a significant effect, catastrophically if the error is of
> the same order of magnitude as the SD of that variable.  
> If other accurately measured explanatory variables are included
> in the same analysis, they may appear falsely significant.  This
> sort of issue has been extensively canvassed in connection
> with the use of food frequency questionnaire (FFQ) measuring
> instruments in large-scale studies of the effect of diet on disease.
> See for example:
> Schatzkin, A.; Kipnis, V.; Carroll, R.; Midthune, D.; Subar, A.; Bingham, S.; 
> Schoeller, D.; Troiano, R.; and Freedman, L., 2003. A comparison of a food frequency 
> ques- tionnaire with a 24-hour recall for use in an epidemiological cohort study: 
> results from the biomarker-based observing protein and energy nutrition (open) 
> study. International Journal of Epidemiology, 32:1054 - 1062.
> Here was an instrument that many thought adequately accurate.
> These problems may of course affect all observational studies.
> Deficiencies in the data and in the modeling (because some
> structure is not accounted for) become more likely to show up
> as the modeling becomes more sensitive to smallish, but 
> perhaps still consequential effects.
> In modest sized experiments, careful design can largely
> avoid such problems.  In experiments where the number
> of subjects is very large, the same sorts of problems will
> almost inevitably appear.  Minor deviations from the
> protocol become almost impossible to avoid.
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

More information about the R-sig-mixed-models mailing list