[R] problems with data frames, factors and lists

Wed May 21 17:33:07 CEST 2008

I have a function that creates a list based on some clustered data:

mix <- function(Y, pid) {
hc = gethc(Y,pid)
maxheight = max(hc$height)
noingrp = processhc(hc)
one = noingrp$one
two = noingrp$two
twoisone = "one"
if (two != 1)
  twoisone = "more"
out = list(pid = pid,one = noingrp$one, two = noingrp$two, diff = maxheight, noseqs = length(hc$labels), twogrp = twoisone)
return(out)
}

example result:

> mix(tsus_same, 77)
$pid
[1] 77

$one
[1] 9

$two
[1] 2

$diff
[1] 8.577195

$noseqs
[1] 11

$twogrp
[1] "more"

>

I then use this function in another function that just runs this
function through a lot of data:

doset <- function(sameset) {
pids = unique(c(sameset$APID, sameset$BPID))
for (f in pids) {
  oputframe = data.frame(rbind(oputframe, mix(sameset, f)))
  }
return(oputframe)
}

All values except $twogrp are numbers. There are two possible values
for $twogrp, "one" and "more". the first one is more common and gets
added to the data frame first. The result is that I cannot add the
rows where this is "more" without getting

38: In `[<-.factor`(`*tmp*`, ri, value = "more") :
  invalid factor level, NAs generated

Now, this is a pain in the neck. How can I merge these lists to the
data frame and still have the value $twogrp as a factor?

Thanks, and I hope my code makes some sense!

Karin
-- 
Karin Lagesen, PhD student 
karin.lagesen at medisin.uio.no
http://folk.uio.no/karinlag