[R] How to visualise what code is processed within a for loop

Luca Meyer |uc@m1968 @end|ng |rom gm@||@com
Mon Apr 30 18:25:28 CEST 2018


Thank you for both replies Don & Rui,

The very issue here is that there is a search that needs to be done within
a text field and I agree with Rui later comment that regexpr might indeed
be the time consuming piece of code.

I might try to optimise this piece of code later on, but for the time being
I am working on the following part of building a neural network to try
indeed classifying some text.

Again, thanks,

Luca

2018-04-30 17:25 GMT+02:00 MacQueen, Don <macqueen1 using llnl.gov>:

> Luca,
>
>
>
> If speed is important, you might improve performance by making d0 into a
> true matrix, rather than a data frame (assuming d0 is indeed a data frame
> at this point). Although data frames may look like matrices, they aren’t,
> and they have some overhead that matrices don’t.  I don’t think you would
> be able to use the [[nm]] syntax with a matrix, but [ , nm] should work,
> provided the matrix has column names. Or you could perhaps index by column
> number.
>
>
>
> I had a project some years ago in which I reduced calculation time a lot
> by extracting the numeric columns of a data frame and working with them,
> then recombining them with the character columns. R’s performance working
> with data frames has improved since then, so I really don’t know if it
> would make a difference for your task.
>
>
>
> -Don
>
>
>
> --
>
> Don MacQueen
>
> Lawrence Livermore National Laboratory
>
> 7000 East Ave., L-627
>
> Livermore, CA 94550
>
> 925-423-1062
>
> Lab cell 925-724-7509
>
>
>
>
>
> *From: *Luca Meyer <lucam1968 using gmail.com>
> *Date: *Monday, April 30, 2018 at 8:08 AM
> *To: *Rui Barradas <ruipbarradas using sapo.pt>
> *Cc: *"MacQueen, Don" <macqueen1 using llnl.gov>, array R-help <
> r-help using r-project.org>
> *Subject: *Re: [R] How to visualise what code is processed within a for
> loop
>
>
>
> Hi Rui
>
> Thank you for your suggestion,
>
>
>
> I have tested the code suggested by you against that supplied by Don in
> terms of timing and results are very much aligned: to populate a 5954x899
> 0/1 matrix on my machine your procedure took 79 secs, while the one with
> ifelse employed 80 secs, hence unfortunately not really any significant
> time saved there.
>
> Nevertheless thank you for your contribution.
>
> Kind regards,
>
>
>
> Luca
>
>
>
> 2018-04-28 23:18 GMT+02:00 Rui Barradas <ruipbarradas using sapo.pt>:
>
> I forgot to explain why my suggestion.
>
> The logical condition returns FALSE/TRUE that in R are coded as 0/1.
> So all you have to do is coerce to integer.
>
> This works because the ifelse will return a 1 or a 0 depending on the
> condition. Meaning exactly the same values. And is more efficient since
> ifelse creates both vectors, the true part and the false part, and then
> indexes those vectors in order to return the appropriate values. This is
> the double of the trouble and a great deal of memory used.
>
> Rui Barradas
>
> On 4/28/2018 10:12 PM, Rui Barradas wrote:
>
> Hello,
>
> instead of ifelse, the following is exactly the same and much more
> efficient.
>
> d0[[nm]] <- as.integer(regexpr(d1[i,1], d0$X0) > 0)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 4/28/2018 8:45 PM, Luca Meyer wrote:
>
> Thanks Don,
>
>      for (i in 1:10){
>        nm <- paste0("V", i)
>        d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
>      }
>
> is exaclty what I needed.
>
> Best regards,
>
> Luca
>
>
> 2018-04-25 23:03 GMT+02:00 MacQueen, Don <macqueen1 using llnl.gov>:
>
> Your code doesn't make sense to me in a couple of ways.
>
> Inside the loop, the first line assigns a value to an object named "t".
> Then, the second line does the same thing, assigns a value to an object
> named "t".
>
> The value of the object named "t" after the second line will be the output
> of the ifelse() expression, whatever that is. This has the effect of making
> the first line irrelevant. Whatever value t has after the first line is
> replaced by whatever it gets from the second line.
>
> It looks like the first line inside the loop is constructing the name of a
> data frame column, and storing that name as a character string. However,
> the second line doesn't use that name at all. If your goal is to update the
> contents of a column, you need to assign something to that column in the
> next line. Instead you assign it to the object named "t".
>
> What you're looking for will be more along the lines of this:
>
>      for (i in 1:10){
>        nm <- paste0("V", i)
>        d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
>      }
>
> This may not a complete solution, since I have no idea what the contents
> or structure of d1 are, or what the regexpr() is expected to return.
>
> And notice the use of double brackets, [[ and ]]. This is one way to
> reference a column of a  data frame when you have the column's name stored
> in a variable. Another way is d0[ , nm]
>
>
> A couple of additional comments:
>
>   "t" is a poor choice of object name, because it is one of R's built-in
> functions (immediately after starting a fresh session of R, with nothing
> left over from any previous session, type help("r") and see what you get).
>
>   ifelse() is intended for use on vectors, not scalars, and it looks like
> maybe you're using it on a scalar (can't be sure about this, though)
>
> For example, ifelse() is designed for this kind of usage:
>
> ifelse( c(TRUE, FALSE, TRUE) , 1:3, 11:13)
>
> [1]  1 12  3
>
> Although it works ok for these
>
> ifelse(TRUE, 3, 4)
>
> [1] 3
>
> ifelse(FALSE, 3, 4)
>
> [1] 4
> They are not really what it is intended for.
>
> --
> Don MacQueen
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
> Lab cell 925-724-7509
>
>
> On 4/24/18, 12:30 AM, "R-help on behalf of Luca Meyer" <
> r-help-bounces using r-project.org on behalf of lucam1968 using gmail.com> wrote:
>
>      Hi,
>
>      I am trying to debug the following code:
>
>      for (i in 1:10){
>        t <- paste("d0$V",i,sep="")
>        t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0)
>      }
>
>      and I would like to see what code is actually processing R, how can I
> do
>      that?
>
>      More to the point, I am trying to update my variables d0$V1 to d0$V10
>      according to the presence or absence of some text (contained in the
> file
>      d1) within the d0$X0 variable.
>
>      The code seem to run ok, if I add print(table(t)) within the loop I
> can see
>      that the ifelse procedure is working and to some cases within the
> d0$V1 to
>      d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to
> d0$V10
>      after the for loop they are all still equal to zero...
>
>      Thanks,
>
>      Luca
>
>          [[alternative HTML version deleted]]
>
>      ______________________________________________
>      R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>      https://stat.ethz.ch/mailman/listinfo/r-help
>      PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
>      and provide commented, minimal, self-contained, reproducible code.
>
>
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list