[R] apply a function down each column

Peter Ehlers ehlers at ucalgary.ca
Tue Jan 12 18:31:57 CET 2010


Laetitia,

I was just responding to your comment that "R complains
about a syntax error". But I realize now that "2x" would
probably cause an "unexpected symbol" error.

Here's what I get when I run your loop; what do you get?

 > for (x in 1:(nrow(dat)-1)) {
+  a <- as.character(dat[(2x-1),1])
Error: unexpected symbol in:
"for (x in 1:(nrow(dat)-1)) {
  a <- as.character(dat[(2x"
 >  b <- as.character(dat[(2x),1])
Error: unexpected symbol in " b <- as.character(dat[(2x"
 >  lettermatch(a,b)
Error in strsplit(a, "") : object 'a' not found
 > }
Error: unexpected '}' in "}"
 >

and here's what I get when I fix the obvious syntax
error:

 > for (x in 1:(nrow(dat)-1)) {
+  a <- as.character(dat[(2*x-1),1])
+  b <- as.character(dat[(2*x),1])
+  lettermatch(a,b)
+ }
Error in fix.by(by.x, x) : 'by' must specify valid column(s)
 >

That leaves two problems:
1) you're looking at the wrong column in dat[,1]; that
    should be dat[,2], etc.
2) that error message indicates that your index variable (x)
    gets to invalid values.

Try this:

for (x in 1:(nrow(dat)/2)) {
  a <- dat[(2*x-1),2]  # odd rows
  b <- dat[(2*x),2]    # even rows
  print(lettermatch(a,b))
}

You don't need the as.character() if you have character data.
Always do a str(dat) before you do any analysis.

  -Peter Ehlers

Laetitia Schmid wrote:
> Dear Peter,
> thank you for the suggestion.
> Unfortunately the star did not help. Did it work for you? For me it seems incomplete somehow.
> Laetitia
> 
> ________________________________________
> From: Peter Ehlers [ehlers at ucalgary.ca]
> Sent: Tuesday, January 12, 2010 09:54 AM
> To: Laetitia Schmid
> Cc: Steve Lianoglou; r-help at r-project.org
> Subject: Re: [R] apply a function down each column
> 
> See inline below.
> 
> Laetitia Schmid wrote:
>> Dear Steve,
>> my solution looks like it would work, but it does not.
>> I attached a text file with an extract of my data. Maybe you can try it
>> yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,, for
>> each column.
>> I do not really know what the problem is. R complains about a syntax error.
>> The function I am applying counts the common strings between the two.
>> Greg Hirson helped me to write it.
>>
>> lettermatch <- function(a, b) {
>>    tb <- merge(as.data.frame(table(strsplit(a, ""))),
>> as.data.frame(table(strsplit(b, ""))), by="Var1")
>>    sum(apply(tb[-1], 1, min))
>> }
>>
>> For example for the second column I tried:
>>
>> for (x in 1:(nrow(dat)-1)) {
>> a <- as.character(dat[(2x-1),1])
> 
> Shouldn't that be 2*x-1??
> 
>   -Peter Ehlers
> 
>> b <- as.character(dat[(2x),1])
>>  lettermatch(a,b)
>> }
>>
>> or
>>
>>  a <- as.character(dat[seq(1, nrow(dat), by=2),2])
>>  b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
>>  all.results <- lettermatch(a,b)
>>
>> With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can
>> leave the "as.character" away in the formula above.
>>
>> Laetitia
>>
>> Individuals    Seq1    Seq2    Seq3    Seq4
>> C1    GGGG    AATT    CCGG    CTTT
>> M1    GGGG    AAAA    GGGG    GGGG
>> C2    GGGG    AATT    CCGG    CTTT
>> M2    AGGG    AACT    CCGG    CGTT
>> C3    AGGG    AACT    CCGG    CGTT
>> M3    AGGG    AACT    CCGG    CGTT
>> C4    GGGG    AATT    CCGG    CCTT
>> M4    GGGG    AAAT    CGGG    CTTT
>> C5    AGGG    ACTT    CCCG    CTTT
>> M5    AGGG    CTTT    CCCC    CCTT
>> C6    AGGG    CTTT    CCCC    CCTT
>> M6    AAAG    CCTT    CCCC    CTTT
>> C7    AAAG    ACCC    CCCG    GTTT
>> M7    AAGG    AACC    CCGG    TTTT
>> C8    GGGG    AATT    CCGG    CCTT
>> M8    GGGG    AATT    CCGG    CCTT
>> C9    GGGG    AAAA    GGGG    TTTT
>> M9    GGGG    AAAA    GGGG    TTTT
>> C11    AGGG    AAAC    CGGG    GGTT
>> M11    GGGG    AATT    CCGG    CCTT
>>
>>
>>
>> Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid <laetitia at gmt.su.se>
>>> wrote:
>>>> Hello World,
>>>> I have a function that makes pairwise comparisons between two
>>>> strings. I would like to apply this function to my data (which
>>>> consists of columns with different strings) in the way that it
>>>> compares the first with the second entry, and then the third with the
>>>> fourth, and then the fifth with the sixth, and so on down each column...
>>>> So (2x-1) and (2x) would be the different entries to be compared!
>>>>
>>>> dat= my data:
>>>>
>>>> for the first column: compare dat[(2x-1),1] with dat[(2x),1] and x
>>>> would be 1:i, i=length(dat[,1])
>>>>
>>>> I think the best way to do that is a loop:
>>>>
>>>> a <- as.character(dat[(2x-1),1])
>>>> b <- as.character(dat[(2x),1])
>>>>
>>>> for (i in 1:length(dat[,1]) my_function(a, b))
>>>>
>>>> Can somebody help me to apply a function with a loop in the way I
>>>> want to a column?
>>> It seems as if you got it already, don't you?
>>>
>>> for (x in 1:(nrow(dat)-1)) {
>>>  a <- dat[(2x-1),1]
>>>  b <- dat[(2x), 1]
>>>  my_function(a,b)
>>> }
>>>
>>>> Is there a specification of "tapply" for that?
>>> I don't think so, but depending on what you want to do, the size of
>>> your data, and the amount of RAM you have, it might be faster to
>>> compare everything "at once" (assuming `my_function` can be
>>> vectorized), for instance:
>>>
>>> a <- dat[seq(1, nrow(dat), by=2),1]
>>> b <- dat[seq(2, nrow(dat), by=2), 1]
>>> all.results <- my_function(a,b)
>>>
>>> Also, as an aside, I see you keep calling "as.character" on your data
>>> when you extract it from your data.frame. Is your data being converted
>>> to factors? You can look to set stringsAsFactors=FALSE if this is the
>>> case and you are reading in data using read.table/delim/etc (see:
>>> ?read.table)
>>>
>>> Hope that helps,
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>> | Memorial Sloan-Kettering Cancer Center
>>> | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 
> --
> Peter Ehlers
> University of Calgary
> 403.202.3921
> 
> 

-- 
Peter Ehlers
University of Calgary
403.202.3921



More information about the R-help mailing list