[R] simple generation of artificial data with defined features
Christos Hatzis
christos.hatzis at nuverabio.com
Fri Aug 22 19:56:16 CEST 2008
On the general question on how to create a dataset that matches the
frequencies in a table, function as.data.frame can be useful. It takes as
argument an object of a class 'table' and returns a data frame of
frequencies.
Consider for example table 6.1 of Fleiss et al (3rd Ed):
> birth.weight <- c(10,15,40,135)
> attr(birth.weight, "class") <- "table"
> attr(birth.weight, "dim") <- c(2,2)
> attr(birth.weight, "dimnames") <- list(c("A", "Ab"), c("B", "Bb"))
> birth.weight
B Bb
A 10 40
Ab 15 135
> summary(birth.weight)
Number of cases in table: 200
Number of factors: 2
Test for independence of all factors:
Chisq = 3.429, df = 1, p-value = 0.06408
>
> bw.dt <- as.data.frame(birth.weight)
Observations (rows) in this table can then be replicated according to their
corresponding frequencies to yield the expanded dataset that conforms with
the original table.
> bw.dt.exp <- bw.dt[rep(1:nrow(bw.dt), bw.dt$Freq), -ncol(bw.dt)]
> dim(bw.dt.exp)
[1] 200 2
> table(bw.dt.exp)
Var2
Var1 B Bb
A 10 40
Ab 15 135
The above approach is not restricted to 2x2 tables, and should be
straightforward generate datasets that conform to arbitrary nxm frequency
tables.
-Christos Hatzis
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Greg Snow
> Sent: Friday, August 22, 2008 12:41 PM
> To: drflxms; r-help at r-project.org
> Subject: Re: [R] simple generation of artificial data with
> defined features
>
> I don't think that the election data is the right data to
> demonstrate Kappa, you need subjects that are classified by 2
> or more different raters/methods. The election data could be
> considered classifying the voters into which party they voted
> for, but you only have 1 rater. Maybe if you had some survey
> data that showed which party each voter voted for in 2 or
> more elections, then that may be a good example dataset.
> Otherwise you may want to stick with the sample datasets.
>
> There are other packages that compute Kappa values as well (I
> don't know if others calculate this particular version), but
> some of those take the summary data as input rather than the
> raw data, which may be easier if you just have the summary tables.
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> (801) 408-8111
>
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of drflxms
> > Sent: Friday, August 22, 2008 6:12 AM
> > To: r-help at r-project.org
> > Subject: [R] simple generation of artificial data with defined
> > features
> >
> > Dear R-colleagues,
> >
> > I am quite a newbie to R fighting my stupidity to solve a probably
> > quite simple problem of generating artificial data with defined
> > features.
> >
> > I am conducting a study of inter-observer-agreement in
> > child-bronchoscopy. One of the most important measures is Kappa
> > according to Fleiss, which is very comfortable available in
> R through
> > the irr-package.
> > Unfortunately medical doctors like me don't really
> understand much of
> > statistics. Therefore I'd like to give the reader an easy
> > understandable example of Fleiss-Kappa in the Methods part.
> To achieve
> > this, I obtained a table with the results of the German
> election from
> > 2005:
> >
> > party number of votes percent
> >
> > SPD 16194665 34,2
> > CDU 13136740 27,8
> > CSU 3494309 7,4
> > Gruene 3838326 8,1
> > FDP 4648144 9,8
> > PDS 4118194 8,7
> >
> > I want to show the agreement of voters measured by Fleiss-Kappa. To
> > calculate this with the kappam.fleiss-function of irr, I need a
> > data.frame like this:
> >
> > (id of 1st voter) (id of 2nd voter)
> >
> > party spd cdu
> >
> > Of course I don't plan to calculate this with the million of cases
> > mentioned in the table above (I am working on a small laptop). A
> > division by 1000 would be more than perfect for this example. The
> > exact format of the table is generally not so important, as I could
> > reshape nearly every format with the help of the reshape-package.
> >
> > Unfortunately I could not figure out how to create such a
> > fictive/artificial dataset as described above. Any
> data.frame would be
> > nice, that keeps at least the percentage. String-IDs of
> parties could
> > be substituted by numbers of course (would be even better
> for function
> > kappam.fleiss in irr!).
> >
> > I would appreciate any kind of help very much indeed.
> > Greetings from Munich,
> >
> > Felix Mueller-Sarnowski
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list