[R-sig-teaching] Creating data

John Fox jfox at mcmaster.ca
Wed Mar 25 00:09:56 CET 2009


Dear Scott,

I have a strong preference for using real data. That said, one strategy for
manufacturing data is to simulate a statistical model, either the same model
that will be used to analyze the data (in which case the latter is the "true
model") or a different model. For example, if you want some terms in the
model to be "non-significant," you'll get that with high probability if
these are omitted from the model used to generate the data. Similarly, you
can generate outliers by sampling errors from a heavy-tailed or suitable
mixture distribution.

I hope this helps,
 John 


> -----Original Message-----
> From: r-sig-teaching-bounces at r-project.org
[mailto:r-sig-teaching-bounces at r-
> project.org] On Behalf Of Scott F. Kiesling
> Sent: March-24-09 4:04 PM
> To: r-sig-teaching at r-project.org
> Subject: [R-sig-teaching] Creating data
> 
> Hi everyone-
> 
> I'm currently teaching a graduate course in statistics for linguistics
> using R. I have used up most of the 'authentic' data I have been able
> to collect for homework and demonstrations. I can think of plenty more
> possible data sets, but I am finding the creation of them challenging,
> and my creations are often somewhat unlealistic (generally, too
> 'neat' and obvious).
> 
> So, I was wondering if anyone had any tips on creating 'realistic'
> data sets, or links/books that describe it.
> 
> For a simple example, let's say I want to create a dataset with
> students from different countries and academic departments who took an
> English test. I want to make some differences (significant and not)
> and possibly even interactions among the scores by country and
> department. I have been doing this through various iterations of
> sample() and rnorm(), and jitter() to get some randomness, but things
> are still coming out pretty neatly.  Is this the right (or a good)
> method? Advice?
> 
> Thanks in advance-
> 
> SFK
> 
> --
> Scott F. Kiesling, PhD
> 
> Associate Professor
> Department Chair
> 
> Department of Linguistics
> University of Pittsburgh, 2816 CL
> Pittsburgh, PA 15260
> http://www.linguistics.pitt.edu
> Office: +1 412-624-5916
> 
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching




More information about the R-sig-teaching mailing list