[R] apply a function down each column
Laetitia Schmid
laetitia.schmid at gmx.ch
Wed Jan 13 23:43:57 CET 2010
Thank you very much! It works now perfectly. I even extended it to be
able to apply it to the whole dataset:
data<-read.delim("mhc_data.txt", stringsAsFactors=FALSE)
lettermatch <- function(a, b) {
tb <- merge(as.data.frame(table(strsplit(a, ""))),
as.data.frame(table(strsplit(b, ""))), by="Var1")
sum(apply(tb[-1], 1, min))
}
output<-matrix(ncol=(ncol(data)-1),nrow=nrow(data)/2)
sim<-rep(0, nrow(data)/2)
for (y in 2:(ncol(data))) {
for (x in 1:(nrow(data)/2)) {
a <- data[(2*x-1),y] # odd rows
b <- data[(2*x),y] # even rows
sim[x]<-(lettermatch(a,b))
}
output[,y-1]<-sim
}
colnames(output)<-c(names(data[2:length(names(data))]))
rownames(output)<-c(1:(nrow(data)/2))
output
Laetitia
Am 12.01.2010 um 18:31 schrieb Peter Ehlers:
> Laetitia,
>
> I was just responding to your comment that "R complains
> about a syntax error". But I realize now that "2x" would
> probably cause an "unexpected symbol" error.
>
> Here's what I get when I run your loop; what do you get?
>
>> for (x in 1:(nrow(dat)-1)) {
> + a <- as.character(dat[(2x-1),1])
> Error: unexpected symbol in:
> "for (x in 1:(nrow(dat)-1)) {
> a <- as.character(dat[(2x"
>> b <- as.character(dat[(2x),1])
> Error: unexpected symbol in " b <- as.character(dat[(2x"
>> lettermatch(a,b)
> Error in strsplit(a, "") : object 'a' not found
>> }
> Error: unexpected '}' in "}"
>>
>
> and here's what I get when I fix the obvious syntax
> error:
>
>> for (x in 1:(nrow(dat)-1)) {
> + a <- as.character(dat[(2*x-1),1])
> + b <- as.character(dat[(2*x),1])
> + lettermatch(a,b)
> + }
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
>>
>
> That leaves two problems:
> 1) you're looking at the wrong column in dat[,1]; that
> should be dat[,2], etc.
> 2) that error message indicates that your index variable (x)
> gets to invalid values.
>
> Try this:
>
> for (x in 1:(nrow(dat)/2)) {
> a <- dat[(2*x-1),2] # odd rows
> b <- dat[(2*x),2] # even rows
> print(lettermatch(a,b))
> }
>
> You don't need the as.character() if you have character data.
> Always do a str(dat) before you do any analysis.
>
> -Peter Ehlers
>
> Laetitia Schmid wrote:
>> Dear Peter,
>> thank you for the suggestion.
>> Unfortunately the star did not help. Did it work for you? For me it
>> seems incomplete somehow.
>> Laetitia
>>
>> ________________________________________
>> From: Peter Ehlers [ehlers at ucalgary.ca]
>> Sent: Tuesday, January 12, 2010 09:54 AM
>> To: Laetitia Schmid
>> Cc: Steve Lianoglou; r-help at r-project.org
>> Subject: Re: [R] apply a function down each column
>>
>> See inline below.
>>
>> Laetitia Schmid wrote:
>>> Dear Steve,
>>> my solution looks like it would work, but it does not.
>>> I attached a text file with an extract of my data. Maybe you can
>>> try it
>>> yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,,
>>> for
>>> each column.
>>> I do not really know what the problem is. R complains about a
>>> syntax error.
>>> The function I am applying counts the common strings between the
>>> two.
>>> Greg Hirson helped me to write it.
>>>
>>> lettermatch <- function(a, b) {
>>> tb <- merge(as.data.frame(table(strsplit(a, ""))),
>>> as.data.frame(table(strsplit(b, ""))), by="Var1")
>>> sum(apply(tb[-1], 1, min))
>>> }
>>>
>>> For example for the second column I tried:
>>>
>>> for (x in 1:(nrow(dat)-1)) {
>>> a <- as.character(dat[(2x-1),1])
>>
>> Shouldn't that be 2*x-1??
>>
>> -Peter Ehlers
>>
>>> b <- as.character(dat[(2x),1])
>>> lettermatch(a,b)
>>> }
>>>
>>> or
>>>
>>> a <- as.character(dat[seq(1, nrow(dat), by=2),2])
>>> b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
>>> all.results <- lettermatch(a,b)
>>>
>>> With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can
>>> leave the "as.character" away in the formula above.
>>>
>>> Laetitia
>>>
>>> Individuals Seq1 Seq2 Seq3 Seq4
>>> C1 GGGG AATT CCGG CTTT
>>> M1 GGGG AAAA GGGG GGGG
>>> C2 GGGG AATT CCGG CTTT
>>> M2 AGGG AACT CCGG CGTT
>>> C3 AGGG AACT CCGG CGTT
>>> M3 AGGG AACT CCGG CGTT
>>> C4 GGGG AATT CCGG CCTT
>>> M4 GGGG AAAT CGGG CTTT
>>> C5 AGGG ACTT CCCG CTTT
>>> M5 AGGG CTTT CCCC CCTT
>>> C6 AGGG CTTT CCCC CCTT
>>> M6 AAAG CCTT CCCC CTTT
>>> C7 AAAG ACCC CCCG GTTT
>>> M7 AAGG AACC CCGG TTTT
>>> C8 GGGG AATT CCGG CCTT
>>> M8 GGGG AATT CCGG CCTT
>>> C9 GGGG AAAA GGGG TTTT
>>> M9 GGGG AAAA GGGG TTTT
>>> C11 AGGG AAAC CGGG GGTT
>>> M11 GGGG AATT CCGG CCTT
>>>
>>>
>>>
>>> Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:
>>>
>>>> Hi,
>>>>
>>>> On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid <laetitia at gmt.su.se
>>>> >
>>>> wrote:
>>>>> Hello World,
>>>>> I have a function that makes pairwise comparisons between two
>>>>> strings. I would like to apply this function to my data (which
>>>>> consists of columns with different strings) in the way that it
>>>>> compares the first with the second entry, and then the third
>>>>> with the
>>>>> fourth, and then the fifth with the sixth, and so on down each
>>>>> column...
>>>>> So (2x-1) and (2x) would be the different entries to be compared!
>>>>>
>>>>> dat= my data:
>>>>>
>>>>> for the first column: compare dat[(2x-1),1] with dat[(2x),1] and x
>>>>> would be 1:i, i=length(dat[,1])
>>>>>
>>>>> I think the best way to do that is a loop:
>>>>>
>>>>> a <- as.character(dat[(2x-1),1])
>>>>> b <- as.character(dat[(2x),1])
>>>>>
>>>>> for (i in 1:length(dat[,1]) my_function(a, b))
>>>>>
>>>>> Can somebody help me to apply a function with a loop in the way I
>>>>> want to a column?
>>>> It seems as if you got it already, don't you?
>>>>
>>>> for (x in 1:(nrow(dat)-1)) {
>>>> a <- dat[(2x-1),1]
>>>> b <- dat[(2x), 1]
>>>> my_function(a,b)
>>>> }
>>>>
>>>>> Is there a specification of "tapply" for that?
>>>> I don't think so, but depending on what you want to do, the size of
>>>> your data, and the amount of RAM you have, it might be faster to
>>>> compare everything "at once" (assuming `my_function` can be
>>>> vectorized), for instance:
>>>>
>>>> a <- dat[seq(1, nrow(dat), by=2),1]
>>>> b <- dat[seq(2, nrow(dat), by=2), 1]
>>>> all.results <- my_function(a,b)
>>>>
>>>> Also, as an aside, I see you keep calling "as.character" on your
>>>> data
>>>> when you extract it from your data.frame. Is your data being
>>>> converted
>>>> to factors? You can look to set stringsAsFactors=FALSE if this is
>>>> the
>>>> case and you are reading in data using read.table/delim/etc (see:
>>>> ?read.table)
>>>>
>>>> Hope that helps,
>>>>
>>>> -steve
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>> | Memorial Sloan-Kettering Cancer Center
>>>> | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> Peter Ehlers
>> University of Calgary
>> 403.202.3921
>>
>>
>
> --
> Peter Ehlers
> University of Calgary
> 403.202.3921
More information about the R-help
mailing list