[Bioc-devel] question on annotations and data.frames

Stefan McKinnon Høj-Edwards Stefan.Hoj-Edwards at agrsci.dk
Thu Aug 25 09:29:17 CEST 2011


Hi Andreas,

> Dear mailing list,
> let's suggest I have a data.frame full of annotation data. In the first
> column I have Identifiers present in the data. In all other columns I have
> another annotaion for this identifier column.
> However, on some rows  there is no 1:1 mapping but let's say a 1:3 mapping
> or something like this:
>
> ID     Symbol
> 1     Bla
> 2     Foo
> 3     XYZ // xyz /// xyz01
> 4     abc
>
> I want to "stretch" the line which has multiple annotation tags. I know I
> can split it with "strsplit", this get's me here:
>
> ID     Symbol
> 1     Bla
> 2     Foo
> 3     "XYZ" "xyz" "xyz01"
> 4     abc
>
> But how can I get to this:
>
> ID     Symbol
> 1     Bla
> 2     Foo
> 3     XYZ
> 3     xyz
> 3     xyz01
> 4     abc

I too, have a solution. Mine relies on the idiom of "do.call(rbind, FUN)" (or cbind), which efficiently rbinds/cbinds the results from FUN. For this to work, FUN must return a list.
First we define the function that does the actual splitting, returning a (sub-)matrix for each row, `splitit`. The we use `apply` to apply `splitit` on each row in the matrix; this returns the desired list that we can put into the do.call idiom.

splitit <- function(row) { 
    # Split the symbols (res is a normal vector)
    res <- strsplit(row[2], ' // ', fixed=TRUE)[[1]]
    # Bind the ID to the symbol vector, clean up the result and return
    res <- t(rbind(row[1], res))	# If the original data contains several columns, just add them here.
    dimnames(res) <- list(c(), c())
    return(res)
}

# Test data from Heidi Dvinge and James W. MacDonald
test <- data.frame(A=c("x", "y", "z"), B=c("X", "Y1 // Y2 // Y3", "Z"),  stringsAsFactors=FALSE)
df <- data.frame(ID = 1:4, Symbol = I(c("Bla","Foo","XYZ // xyz // xyz01", "abc")))

do.call(rbind, apply(test, 1, splitit))
do.call(rbind, apply(df, 1, splitit))


Kind regards,
Stefan McKinnon Høj-Edwards     Dept. of Genetics and Biotechnology
PhD student                     Faculty of Agricultural Sciences
stefan.hoj-edwards at agrsci.dk    Aarhus University
Tel.: +45 8999 1291             Blichers Allé 20, Postboks 50
Web: www.iysik.com              DK-8830 Tjele
                                Tel.: +45 8999 1900
                                Web: www.agrsci.au.dk



More information about the Bioc-devel mailing list