[Bioc-devel] question on annotations and data.frames
James W. MacDonald
jmacdon at med.umich.edu
Tue Aug 23 15:04:07 CEST 2011
Hi Andreas,
On 8/23/2011 8:12 AM, Andreas Heider wrote:
> Dear mailing list,
> let's suggest I have a data.frame full of annotation data. In the first
> column I have Identifiers present in the data. In all other columns I have
> another annotaion for this identifier column.
> However, on some rows there is no 1:1 mapping but let's say a 1:3 mapping
> or something like this:
>
> ID Symbol
> 1 Bla
> 2 Foo
> 3 XYZ // xyz /// xyz01
> 4 abc
>
> I want to "stretch" the line which has multiple annotation tags. I know I
> can split it with "strsplit", this get's me here:
>
> ID Symbol
> 1 Bla
> 2 Foo
> 3 "XYZ" "xyz" "xyz01"
> 4 abc
>
> But how can I get to this:
>
> ID Symbol
> 1 Bla
> 2 Foo
> 3 XYZ
> 3 xyz
> 3 xyz01
> 4 abc
>
>
> Your help will be really appreciated!
Here is one way:
> df <- data.frame(ID = 1:4,
Symbol = I(c("Bla","Foo","XYZ // xyz // xyz01", "abc")))
> lst <- tapply(1:nrow(df), df$ID, function(x) df[x,2])
> lst <- lapply(lst, function(x) strsplit(x, " // "))
> newdf <- data.frame(ID = rep(df[,1], sapply(lst, length)), Symbol =
unlist(lst))
> newdf
ID Symbol
1 1 Bla
2 2 Foo
31 3 XYZ
32 3 xyz
33 3 xyz01
4 4 abc
Best,
Jim
>
> Thanks in advance, Andreas
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioc-devel
mailing list