[R] simple generation of artificial data with defined features

drflxms drflxms at googlemail.com
Sun Aug 24 13:01:27 CEST 2008


Hello all,

beside saying again thank you for your help, I'd like to present the
final solution of my problem and the results of the kappa-calculation:

> election.2005 <- c(16194,13136,3494,3838,4648,4118)
#data obtained via genesis-database of "Statistisches Bundesamt"
www.destatis.de
#simply cut of last 3 digits because of limited calculation-power of laptop
> attr(election.2005, "class") <- "table"
> attr(election.2005, "dim") <- c(1,6)
> attr(election.2005, "dimnames") <- list(c("votes"), c(1,2,3,4,5,6))
#used numbers instead of names of parties for easier handling later on
#1=spd,2=cdu,3=csu,4=gruene,5=fdp,6=pds
> head(election.2005)
      [,1]  [,2] [,3] [,4] [,5] [,6]
[1,] 16194 13136 3494 3838 4648 4118
#replicate rows according to frequency-table:
> el.dt.exp <- el.dt[rep(1:nrow(el.dt), el.dt$Freq), -ncol(el.dt)]
> el.dt.exp$id=seq(1:nrow(el.dt.exp)) #add voter id
> el.dt.exp$year=2005 #add column with year of election
# remove a column we don't need:
> el.dt.exp<-subset(el.dt.exp, select=-c(Var1))
> dim(el.dt.exp)
[1] 45428     3
> head(el.dt.exp)
    Var2 id year
1      1  1 2005
1.1    1  2 2005
1.2    1  3 2005
1.3    1  4 2005
1.4    1  5 2005
1.5    1  6 2005
1.5    1  6 2005
> el.dt.exp<-as.data.frame(el.dt.exp, row.names=seq(1:nrow(el.dt.exp)))
# get rid of the unusual numbering of rows
> head(el.dt.exp)
  Var2 id year
1    1  1 2005
2    1  2 2005
3    1  3 2005
4    1  4 2005
5    1  5 2005
6    1  6 2005
> summary(el.dt.exp)
 Var2            id             year    
 1:16194   Min.   :    1   Min.   :2005 
 2:13136   1st Qu.:11358   1st Qu.:2005 
 3: 3494   Median :22715   Median :2005 
 4: 3838   Mean   :22715   Mean   :2005 
 5: 4648   3rd Qu.:34071   3rd Qu.:2005 
 6: 4118   Max.   :45428   Max.   :2005 

Var2 is of type character, which is uncomfortable for further processing.
I changed type with the data editor using fix(el.dt.exp) to number.

#create the dataframe for the calculation of kappa
> library(reshape)
> el.dt.exp.molten<-melt(el.dt.exp, id=c(2,3), na.rm=FALSE)
> kappa.frame<-cast(el.dt.exp.molten, year ~ id)
> dim(kappa.frame)
[1]     1 45429
#calculate kappa
> library(irr)
> kappam.fleiss(kappa.frame, exact=FALSE, detail=TRUE)
 Fleiss' Kappa for m Raters

 Subjects = 1
   Raters = 45428
    Kappa = -2.2e-05

        z = -1.35
  p-value = 0.176

   Kappa      z p.value
1  0.000 -0.707   0.479
2  0.000 -0.707   0.479
3  0.000 -0.707   0.479
4  0.000 -0.707   0.479
5  0.000 -0.707   0.479
6  0.000 -0.707   0.479

What a surprise! So Greg was absolutely right, that this is probably not
a good example for Kappa. But still a very interesting one, if you ask me!

My theory: Kappa doesn't express simply agreement. As far as I learned
from the Handbook of Inter-Rater Reliability (Gwet, Kilem 2001; STATAXIS
Publishing Company;  www.stataxis.com) Kappa tries to measure how
different and observed agreement is from an agreement that arises from
chance.
So in this case this probably means, that the results of the election
2005 are not significantly different from results, that could have
arisen by chance.

Anyway I personally learned a very interesting lesson about Kappa and R.
Thank you all for your professional and quick help to a newbie!
Greetings from Munich,

Felix



More information about the R-help mailing list