[Bioc-devel] question on annotations and data.frames

Marc Carlson mcarlson at fhcrc.org
Thu Sep 15 02:25:29 CEST 2011


Sorry for the delay as this one appears to have slipped past me.  I am 
pretty sure that Jim intended to put something more like this in his 
example:

df <- data.frame(ID = 1:4, Symbol = I(c("Bla","Foo","XYZ // xyz // 
xyz01", "abc")))
lst <- tapply(1:nrow(df), df$ID, function(x) df[x,2])
lst <- lapply(lst, function(x) strsplit(x, " // "))
newdf <- data.frame(ID = rep(df[,1], sapply(lst, function(x) 
length(unlist(x)) )), Symbol = unlist(lst))
newdf


   Marc



On 08/24/2011 06:00 AM, Andreas Heider wrote:
> Thank you,
> the second approach does not work for me (not sure why). The first one
> works, but is not straight forward.
> Is there an easier way to do it? I tried to use the "reshape" R package but
> without success.
>
> Greets, Andreas
>
> 2011/8/23 James W. MacDonald<jmacdon at med.umich.edu>
>
>> Hi Andreas,
>>
>>
>>
>> On 8/23/2011 8:12 AM, Andreas Heider wrote:
>>
>>> Dear mailing list,
>>> let's suggest I have a data.frame full of annotation data. In the first
>>> column I have Identifiers present in the data. In all other columns I have
>>> another annotaion for this identifier column.
>>> However, on some rows  there is no 1:1 mapping but let's say a 1:3 mapping
>>> or something like this:
>>>
>>> ID     Symbol
>>> 1     Bla
>>> 2     Foo
>>> 3     XYZ // xyz /// xyz01
>>> 4     abc
>>>
>>> I want to "stretch" the line which has multiple annotation tags. I know I
>>> can split it with "strsplit", this get's me here:
>>>
>>> ID     Symbol
>>> 1     Bla
>>> 2     Foo
>>> 3     "XYZ" "xyz" "xyz01"
>>> 4     abc
>>>
>>> But how can I get to this:
>>>
>>> ID     Symbol
>>> 1     Bla
>>> 2     Foo
>>> 3     XYZ
>>> 3     xyz
>>> 3     xyz01
>>> 4     abc
>>>
>>>
>>> Your help will be really appreciated!
>>>
>> Here is one way:
>>
>>> df<- data.frame(ID = 1:4,
>> Symbol = I(c("Bla","Foo","XYZ // xyz // xyz01", "abc")))
>>
>>> lst<- tapply(1:nrow(df), df$ID, function(x) df[x,2])
>>> lst<- lapply(lst, function(x) strsplit(x, " // "))
>>> newdf<- data.frame(ID = rep(df[,1], sapply(lst, length)), Symbol =
>> unlist(lst))
>>> newdf
>>    ID Symbol
>> 1   1    Bla
>> 2   2    Foo
>> 31  3    XYZ
>> 32  3    xyz
>> 33  3  xyz01
>> 4   4    abc
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>>
>>> Thanks in advance, Andreas
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>> ************************************************************
>> Electronic Mail is not secure, may not be read every day, and should not be
>> used for urgent or sensitive issues
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list