[R-sig-DB] request for examples

ripiey m@iii@g oii st@ts@ox@@c@uk ripiey m@iii@g oii st@ts@ox@@c@uk
Mon May 13 10:18:57 CEST 2002


On Mon, 13 May 2002, Paul Murrell wrote:

> Hi
>
> I hope you don't mind this "cold call", but this seems like a really
> good place to contact people with interest/experience/expertise in stats
> and databases ...
>
> I am busy producing a course on statistical computing for stage II
> students (to be delivered in the second half of this year).
>
> I will be teaching them about some databases issues:  advantages of
> databases as a way to store information, how to design databases
> properly, how to retrieve information using SQL.
>
> What I am seriously lacking are some killer examples.
>
> Would anyone be able to help me with any of the following ...
>
> (i)  killer examples where a database is clearly a superior method of
> storing information than, say, plain text files or spreadsheets or
> statistical-package-specific formats

That's true of almost all data mining applications.  Think about
a supermarket chain collecting information on all transactions at tills.

Reasons include scale (as above), integrity of data coming from multiple
sources (also as above) and security (most organizations' financial data
is in databases).   Related to scale is efficiency: lots of preprocessing
(indices etc) makes online queries possible.

Another good example is an online transaction system such as airlines'
booking systems and those behind banks' ATM networks.  Or, since, I have
just been browsing one, large discussion forums,  web search engines ....

>From memory, Hand, Mannila, Smyth (2001) Principles of Data Mining
MIT Press, is a good source for statistics/databases interaction.

> (ii)  an actual real-life statistical database that could be copied to a
> local server for the students to practise accessing

On what DBMS?

> (iii)  killer examples where an important data source is stored in a
> database therefore requiring something like SQL knowledge to get access
> to the information.

(Many) pharamaceuticals have their gene chip data stored in databases.
We had to set up Oracle lite and get help (thanks Fei) to get some
results out last year.   Insurance companies have all their claims data on
databases, and MSc summer projects have been 25% taken up extracting the
data.

Yet another one was work on university admissions data: both locally and
nationally that was on a database, and about 70,000 records were extracted
to a spreadsheet.  (And that was just one year's data.)

Brian

-- 
Brian D. Ripley,                  ripley using stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-sig-DB mailing list