[R] apply a function down each column

Laetitia Schmid laetitia at gmt.su.se
Tue Jan 12 15:24:23 CET 2010


Dear Peter,
thank you for the suggestion.
Unfortunately the star did not help. Did it work for you? For me it seems incomplete somehow.
Laetitia

________________________________________
From: Peter Ehlers [ehlers at ucalgary.ca]
Sent: Tuesday, January 12, 2010 09:54 AM
To: Laetitia Schmid
Cc: Steve Lianoglou; r-help at r-project.org
Subject: Re: [R] apply a function down each column

See inline below.

Laetitia Schmid wrote:
> Dear Steve,
> my solution looks like it would work, but it does not.
> I attached a text file with an extract of my data. Maybe you can try it
> yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,, for
> each column.
> I do not really know what the problem is. R complains about a syntax error.
> The function I am applying counts the common strings between the two.
> Greg Hirson helped me to write it.
>
> lettermatch <- function(a, b) {
>    tb <- merge(as.data.frame(table(strsplit(a, ""))),
> as.data.frame(table(strsplit(b, ""))), by="Var1")
>    sum(apply(tb[-1], 1, min))
> }
>
> For example for the second column I tried:
>
> for (x in 1:(nrow(dat)-1)) {
> a <- as.character(dat[(2x-1),1])

Shouldn't that be 2*x-1??

  -Peter Ehlers

> b <- as.character(dat[(2x),1])
>  lettermatch(a,b)
> }
>
> or
>
>  a <- as.character(dat[seq(1, nrow(dat), by=2),2])
>  b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
>  all.results <- lettermatch(a,b)
>
> With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can
> leave the "as.character" away in the formula above.
>
> Laetitia
>
> Individuals    Seq1    Seq2    Seq3    Seq4
> C1    GGGG    AATT    CCGG    CTTT
> M1    GGGG    AAAA    GGGG    GGGG
> C2    GGGG    AATT    CCGG    CTTT
> M2    AGGG    AACT    CCGG    CGTT
> C3    AGGG    AACT    CCGG    CGTT
> M3    AGGG    AACT    CCGG    CGTT
> C4    GGGG    AATT    CCGG    CCTT
> M4    GGGG    AAAT    CGGG    CTTT
> C5    AGGG    ACTT    CCCG    CTTT
> M5    AGGG    CTTT    CCCC    CCTT
> C6    AGGG    CTTT    CCCC    CCTT
> M6    AAAG    CCTT    CCCC    CTTT
> C7    AAAG    ACCC    CCCG    GTTT
> M7    AAGG    AACC    CCGG    TTTT
> C8    GGGG    AATT    CCGG    CCTT
> M8    GGGG    AATT    CCGG    CCTT
> C9    GGGG    AAAA    GGGG    TTTT
> M9    GGGG    AAAA    GGGG    TTTT
> C11    AGGG    AAAC    CGGG    GGTT
> M11    GGGG    AATT    CCGG    CCTT
>
>
>
> Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:
>
>> Hi,
>>
>> On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid <laetitia at gmt.su.se>
>> wrote:
>>> Hello World,
>>> I have a function that makes pairwise comparisons between two
>>> strings. I would like to apply this function to my data (which
>>> consists of columns with different strings) in the way that it
>>> compares the first with the second entry, and then the third with the
>>> fourth, and then the fifth with the sixth, and so on down each column...
>>> So (2x-1) and (2x) would be the different entries to be compared!
>>>
>>> dat= my data:
>>>
>>> for the first column: compare dat[(2x-1),1] with dat[(2x),1] and x
>>> would be 1:i, i=length(dat[,1])
>>>
>>> I think the best way to do that is a loop:
>>>
>>> a <- as.character(dat[(2x-1),1])
>>> b <- as.character(dat[(2x),1])
>>>
>>> for (i in 1:length(dat[,1]) my_function(a, b))
>>>
>>> Can somebody help me to apply a function with a loop in the way I
>>> want to a column?
>>
>> It seems as if you got it already, don't you?
>>
>> for (x in 1:(nrow(dat)-1)) {
>>  a <- dat[(2x-1),1]
>>  b <- dat[(2x), 1]
>>  my_function(a,b)
>> }
>>
>>> Is there a specification of "tapply" for that?
>>
>> I don't think so, but depending on what you want to do, the size of
>> your data, and the amount of RAM you have, it might be faster to
>> compare everything "at once" (assuming `my_function` can be
>> vectorized), for instance:
>>
>> a <- dat[seq(1, nrow(dat), by=2),1]
>> b <- dat[seq(2, nrow(dat), by=2), 1]
>> all.results <- my_function(a,b)
>>
>> Also, as an aside, I see you keep calling "as.character" on your data
>> when you extract it from your data.frame. Is your data being converted
>> to factors? You can look to set stringsAsFactors=FALSE if this is the
>> case and you are reading in data using read.table/delim/etc (see:
>> ?read.table)
>>
>> Hope that helps,
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

--
Peter Ehlers
University of Calgary
403.202.3921



More information about the R-help mailing list