[R] Fw: Improving loop performance

Tue May 11 20:19:39 CEST 2010

I didn't see the cumulative paste before,
so my 'ifelse' suggestion doesn't make much
sense, sorry.

But if what you show is representative in that
there are very few 'FALSE' values in 'first.exon',
then you could use 'ifelse' and then go back and
fix the wrong ones with a loop over the rows that
are:

which(!first.exon)

Alternatively if there are a lot of 'FALSE' values
and you only care about the longest sequence rather
than the intermediates as well, then you could do
each set separately using the 'collapse' argument
to 'paste'.  In this scenario, 'rle' would probably
be useful as well.

On 11/05/2010 18:50, Mark Lamias wrote:
> I will clarify my problem as others has asked for more detail:
>
> I have a dataframe, aga2, that looks like this:
>
>     Row.ID AgilentProbe GeneSymbol GeneID Exons AgilentStart first.geneid first.exon last.geneid last.exon
> 8    1348 A_23_P116898        A2M      2    34      9112685         TRUE       TRUE        TRUE      TRUE
> 62  19410  A_23_P95594       NAT1      9     4     18124656         TRUE       TRUE        TRUE      TRUE
> 39  10323  A_23_P31798       NAT2     10     2     18302422         TRUE       TRUE        TRUE      TRUE
> 21   5353 A_23_P162918   SERPINA3     12     5     94150936         TRUE       TRUE       FALSE     FALSE
> 22   9999 A_23_P162913   SERPINA3     12     5     94150800        FALSE      FALSE       FALSE     FALSE
> 98  29990 A_32_P151937   SERPINA3     12     5     94150720        FALSE      FALSE       FALSE      TRUE
> 33   9516   A_23_P2920   SERPINA3     12     7     94158435        FALSE       TRUE       FALSE      TRUE
> 96  29595 A_32_P124727   SERPINA3     12     8     94160018        FALSE       TRUE        TRUE      TRUE
> 57  18176  A_23_P80570      AADAC     13     5    153028473         TRUE       TRUE        TRUE      TRUE
> 46  16139  A_23_P56529       AAMP     14     9    218838396         TRUE       TRUE        TRUE      TRUE
>
> For the above example, I would like to end up with a vector, probe1, like this, based upon the AgilentProbe values:
>
> A_23_P116898
> A_23_P95594
> A_23_P31798
> A_23_P162918
> A_23_P162918,A_23_P162913,
> A_23_P162918,A_23_P162913,A_32_P151937
> A_23_P2920
> A_32_P124727
> A_23_P80570
> A_23_P56529
>
> I build up each element of the vector based upon the value of last.exon.  If the value of last.exon is FALSE, I'd like to obtain the previous value of AgilentProbe and concatenate it with the current value, and then move on to the next element.
>
> As stated previously, this code works, but it is very slow with larger datasets:
>
> probe1<- character(dim(aga2)[1])
> agstart<- character(dim(aga2)[1])
>
> for (i in 1:dim(aga2)[1])
> {
>   if (aga2$first.exon[i]==TRUE)
>   {
>    probe1[i]<-as.character(aga2[i, "AgilentProbe"])
>    agstart[i]<-as.character(aga2[i, "AgilentStart"])
>
>   }
>   else
>   {
>    probe1[i]<-paste(probe1[i-1], aga2[i, "AgilentProbe"], sep=",")
>    agstart[i]<-paste(agstart[i-1], aga2[i, "AgilentStart"], sep=",")
>   }
> }
>
>
> I tried a few of the previous suggestions (and tried modifying them), but they didn't seem to quite do the trick.  Any assistance would be greatly appreciated.
>
> Thanks a million.
>
> --Mark Lamias
>
>
>
>
>
>
>
>
>
> ----- Forwarded Message ----
> From: jim holtman<jholtman at gmail.com>
> To: Mark Lamias<mlamias at yahoo.com>
> Cc: r-help at r-project.org
> Sent: Tue, May 11, 2010 12:46:26 PM
> Subject: Re: [R] Improving loop performance
>
> It was supposed to be  'head(p1, -1)'  instead of  'tail(p1, -1)'
>
>
> On Tue, May 11, 2010 at 12:17 PM, Mark Lamias<mlamias at yahoo.com>  wrote:
>
> R-users,
>>
>> I have the following piece of code which I am trying to run on a dataframe (aga2) with about a half million records.  While the code works, it is extremely slow.  I've read some of the help archives indicating that I should allocate space to the p1 and ags1 vectors, which I have done, but this doesn't seem to improve speed much.  Would anyone be able to provide me with advice on how I might be able to speed this up?
>>
>>
>> p1<- character(dim(aga2)[1])
>> ags<- character(dim(aga2)[1])
>> for (i in 1:dim(aga2)[1])
>> {
>>   if (aga2$first.exon[i]==TRUE)
>>   {
>>    p1[i]<-as.character(aga2[i, "AP"])
>>    ags[i]<-as.character(aga2[i, "AS"])
>>
>>   }
>>   else
>>   {
>>    p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",")
>>    ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",")
>>   }
>> }
>>
>> Thanks.
>>
>> --Mark Lamias
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Patrick Burns
pburns at pburns.seanet.com
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')