[R-sig-teaching] Creating data

Christophe Genolini cgenolin at u-paris10.fr
Tue Mar 24 21:15:14 CET 2009


Hi Scott

When I create artificial data, I try to "copy" some real. So I mesure 
the real data with as much parameters than I can (mean, var, cov, but 
also percent of NA, outlier), then I generate the artificial one. It is 
also possible to generate several sets that I finaly mixe (lika one set 
for men, one set for women. Then I remove the variable "gender", I merge 
the two set and I shuffle the resulting set.


Christophe
> Hi everyone-
>
> I'm currently teaching a graduate course in statistics for linguistics
> using R. I have used up most of the 'authentic' data I have been able
> to collect for homework and demonstrations. I can think of plenty more
> possible data sets, but I am finding the creation of them challenging,
> and my creations are often somewhat unlealistic (generally, too
> 'neat' and obvious). 
>
> So, I was wondering if anyone had any tips on creating 'realistic'
> data sets, or links/books that describe it.
>
> For a simple example, let's say I want to create a dataset with
> students from different countries and academic departments who took an
> English test. I want to make some differences (significant and not)
> and possibly even interactions among the scores by country and
> department. I have been doing this through various iterations of
> sample() and rnorm(), and jitter() to get some randomness, but things
> are still coming out pretty neatly.  Is this the right (or a good)
> method? Advice?
>
> Thanks in advance-
>
> SFK
>
>




More information about the R-sig-teaching mailing list