[R] Plotting Comparisons with Missing Data

Sun Mar 7 06:40:01 CET 2010

On 2010-03-05 18:00, Alastair wrote:
>
> Hi,
>
> I'm new to R and I've run into a problem that I'm not really sure how to
> express properly in the language. I've got a data table that I've read from
> a file containing some simple information about the performance of 4
> algorithms. The columns are the name of the algorithm, the problem instance
> and the resulting score on that problem (if it wasn't solved I mark that
> with NA).
>
> solver instance result
> A prob1 40
> B prob1 NA
> C prob1 39
> D prob1 35
> A prob2 100
> B prob2 50
> C prob2 NA
> D prob2 NA
> A prob3 75
> B prob3 80
> C prob3 60
> D prob3 70
> A prob4 80
> B prob4 NA
> C prob4 85
> D prob4 75
>
> I've managed to read in the data as follows:
> data<- table.read("./test.txt", header = TRUE, colClasses =
> c("factor","factor","numeric"), na.strings = c("NA"))
> and I've got a nice barchart via lattice
> library(lattice)
> barchart(result ~ instance, group = solver, data = data)
>
> What I want to try and calculate (and plot somehow) is
> a) What percentage of the instances each solver can solve
> and b) What percentage of the instances a solver returns a better score than
> solver A for that particular problem.
>
> These don't seem like particularly ambitious requirements, but I still don't
> really know where to start. Any pointers would be most appreciated.

(I doubt that table.read(....) worked for you.)

For (a), use tapply():

  with(data, tapply(result, solver, function(x) sum(!is.na(x))))

# A B C D
# 4 2 3 3

For (b), you could either use reshape() to transform to wide format
or generate an appropriate matrix; in either case, follow with apply():

m <- with(data, tapply(result, list(instance, solver), function(x) x))
m

apply(m[,-1], 2, function(x) sum(x > m[, 1], na.rm=TRUE))

# B C D
# 1 1 0

  -Peter Ehlers

>
> Thanks,
> Alastair

-- 
Peter Ehlers
University of Calgary