[R-sig-DB] Re: database example

Douglas Bates b@te@ @end|ng |rom @t@t@w|@c@edu
Thu May 16 16:03:42 CEST 2002


Paul Murrell <p.murrell using auckland.ac.nz> writes:

> > > I am preparing a new computing course for stage II stats students and I
> > > will be doing some stuff on databases.
> > >
> > > I am on the lookout for example databases that I would be able to
> > > provide for the students to practice querying.  I have a machine set up
> > > for the job with MySQL installed, but I'm a bit light on actual
> > > databases(!)

> > I showed examples of the use of relational databases in a short
> > course that I taught in Sweden and in a course here.  They are
> > particularly useful in multilevel modeling because R wants the
> > data as a single data frame but it is most useful to have the data
> > stored in multiple tables according to the level of grouping.
> > 
> > We already have an example of that in the nlme package with the
> > Mathematics Achievement used by Raudenbush and Bryk (2002).  The data
> > are called MathAchieve and MathAchSchool.  Some slides describing the
> > use of MySQL or PostgreSQL for these data are under the "Multilevel
> > Models" link at http://www.stat.wisc.edu/~st850-1
> 
> That is fantastic.  Thank you!  May I use the examples freely in my
> teaching?
> 

Certainly.

> > It turns out that the data set as provided by Bryk and Raudenbush did
> > not have the mean socioeconomic score calculated correctly.  John Fox,
> > in his online appendix on Linear Mixed Models to his forthcoming book
> > "An R and S-PLUS Companion to Applied Regression" shows how you could
> > do that calculation in R.  In SQL the calculation would be
> > 
> > SELECT avg(ses) FROM student GROUP BY school ORDER BY school;
> > 
> > To put that into a new table I think it would be most effective to
> > create the table from both the student and school tables.  (SQL is not
> > very good at incorporating new values, calculated within SQL, into
> > existing tables.)
> > 
> > If you check John's appendix you will see that social scientists
> > typically use within-group centered values of covariates like ses,
> > which can be achieved with a SELECT statement joining the tables.
> > When I do this example in PostgreSQL I create a view then use
> > db.copy.table to copy the view.  With MySQL you would need to use an
> > explicit select statement.
> > 
> > CREATE VIEW df AS
> >    SELECT c.School, c.MEANSES, c.Sector, t.SES - c.MEANSES as cses, t.MathAch
> >       FROM school as c, student as t
> >       WHERE c.School = t.School
> > 
> > Another interesting multilevel data set, used by Rodriguez and
> > Goldman, is available at http://data.princeton.edu/multilevel/
> > That page has links both to the real data and to simulated data sets.  All
> > have a three-level structure of child within family within community.
> 
> This also looks very useful.  Thanks again!
> 
> > Finally I was involved in a large project on the analysis of data from
> > the Texas Assessment of Academic Skills (TAAS) which consisted of 10.5
> > million test scores on 3.5 million students in thousands of schools
> > (campuses) within hundreds of school districts.
> > 
> > The enclosed file, installation.txt, describes the installation of the
> > database and some cleaning up of the records.  Cleaning the database
> > was a major part of the analysis.
> 
> Again superb!  Thank you so much Doug!  With this, and other responses
> from the mailing list, I have a plenty of examples for my course to
> justify the use of databases (by statisticians) and to demonstrate their
> use (by statisticians).  This is such a huge help for me.



More information about the R-sig-DB mailing list