[R] apply a function down each column
Laetitia Schmid
laetitia at gmt.su.se
Tue Jan 12 15:24:23 CET 2010
Dear Peter,
thank you for the suggestion.
Unfortunately the star did not help. Did it work for you? For me it seems incomplete somehow.
Laetitia
________________________________________
From: Peter Ehlers [ehlers at ucalgary.ca]
Sent: Tuesday, January 12, 2010 09:54 AM
To: Laetitia Schmid
Cc: Steve Lianoglou; r-help at r-project.org
Subject: Re: [R] apply a function down each column
See inline below.
Laetitia Schmid wrote:
> Dear Steve,
> my solution looks like it would work, but it does not.
> I attached a text file with an extract of my data. Maybe you can try it
> yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,, for
> each column.
> I do not really know what the problem is. R complains about a syntax error.
> The function I am applying counts the common strings between the two.
> Greg Hirson helped me to write it.
>
> lettermatch <- function(a, b) {
> tb <- merge(as.data.frame(table(strsplit(a, ""))),
> as.data.frame(table(strsplit(b, ""))), by="Var1")
> sum(apply(tb[-1], 1, min))
> }
>
> For example for the second column I tried:
>
> for (x in 1:(nrow(dat)-1)) {
> a <- as.character(dat[(2x-1),1])
Shouldn't that be 2*x-1??
-Peter Ehlers
> b <- as.character(dat[(2x),1])
> lettermatch(a,b)
> }
>
> or
>
> a <- as.character(dat[seq(1, nrow(dat), by=2),2])
> b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
> all.results <- lettermatch(a,b)
>
> With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can
> leave the "as.character" away in the formula above.
>
> Laetitia
>
> Individuals Seq1 Seq2 Seq3 Seq4
> C1 GGGG AATT CCGG CTTT
> M1 GGGG AAAA GGGG GGGG
> C2 GGGG AATT CCGG CTTT
> M2 AGGG AACT CCGG CGTT
> C3 AGGG AACT CCGG CGTT
> M3 AGGG AACT CCGG CGTT
> C4 GGGG AATT CCGG CCTT
> M4 GGGG AAAT CGGG CTTT
> C5 AGGG ACTT CCCG CTTT
> M5 AGGG CTTT CCCC CCTT
> C6 AGGG CTTT CCCC CCTT
> M6 AAAG CCTT CCCC CTTT
> C7 AAAG ACCC CCCG GTTT
> M7 AAGG AACC CCGG TTTT
> C8 GGGG AATT CCGG CCTT
> M8 GGGG AATT CCGG CCTT
> C9 GGGG AAAA GGGG TTTT
> M9 GGGG AAAA GGGG TTTT
> C11 AGGG AAAC CGGG GGTT
> M11 GGGG AATT CCGG CCTT
>
>
>
> Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:
>
>> Hi,
>>
>> On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid <laetitia at gmt.su.se>
>> wrote:
>>> Hello World,
>>> I have a function that makes pairwise comparisons between two
>>> strings. I would like to apply this function to my data (which
>>> consists of columns with different strings) in the way that it
>>> compares the first with the second entry, and then the third with the
>>> fourth, and then the fifth with the sixth, and so on down each column...
>>> So (2x-1) and (2x) would be the different entries to be compared!
>>>
>>> dat= my data:
>>>
>>> for the first column: compare dat[(2x-1),1] with dat[(2x),1] and x
>>> would be 1:i, i=length(dat[,1])
>>>
>>> I think the best way to do that is a loop:
>>>
>>> a <- as.character(dat[(2x-1),1])
>>> b <- as.character(dat[(2x),1])
>>>
>>> for (i in 1:length(dat[,1]) my_function(a, b))
>>>
>>> Can somebody help me to apply a function with a loop in the way I
>>> want to a column?
>>
>> It seems as if you got it already, don't you?
>>
>> for (x in 1:(nrow(dat)-1)) {
>> a <- dat[(2x-1),1]
>> b <- dat[(2x), 1]
>> my_function(a,b)
>> }
>>
>>> Is there a specification of "tapply" for that?
>>
>> I don't think so, but depending on what you want to do, the size of
>> your data, and the amount of RAM you have, it might be faster to
>> compare everything "at once" (assuming `my_function` can be
>> vectorized), for instance:
>>
>> a <- dat[seq(1, nrow(dat), by=2),1]
>> b <- dat[seq(2, nrow(dat), by=2), 1]
>> all.results <- my_function(a,b)
>>
>> Also, as an aside, I see you keep calling "as.character" on your data
>> when you extract it from your data.frame. Is your data being converted
>> to factors? You can look to set stringsAsFactors=FALSE if this is the
>> case and you are reading in data using read.table/delim/etc (see:
>> ?read.table)
>>
>> Hope that helps,
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Peter Ehlers
University of Calgary
403.202.3921
More information about the R-help
mailing list