[R] progress of LDA algorithm...
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Sun Jan 30 17:36:59 CET 2022
I am not an expert, but I believe your extrapolation idea is unsound.
Again, post on the HPC list to get expert feedback instead of trying
to reinvent your own wheel. I will not respond further.
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sun, Jan 30, 2022 at 3:02 AM akshay kulkarni <akshay_e4 using hotmail.com> wrote:
>
> dear Avi and Bert,
> I think I got my answer. I will just run it with a small sample and check the execution time and extrapolate from that. By the way, LDA (I am using topicmodels package) cannot be parallelized, right? Thanks in advance.
>
> Thanking you,
> Yours sincerely,
> AKSHAY M KULKARNI
> ________________________________
> From: R-help <r-help-bounces using r-project.org> on behalf of Avi Gross via R-help <r-help using r-project.org>
> Sent: Sunday, January 30, 2022 4:15 AM
> Cc: r-help using r-project.org <r-help using r-project.org>
> Subject: Re: [R] progress of LDA algorithm...
>
> I agree with Bert that this is way off topic and one few here know (or care) about.
>
> Generally, if a package has functionality with manual pages, it may have abilities defined such as setting verbose=TRUE or to various levels of output that may satisfy the request or they may make a copy of code including their print or logging statements and so on.
>
> If the request is more general such as how to run a program under some debugging method and set checkpoints at which some reporting is done, that too is a bit outside the normal uses of this forum.
>
> The usual suggestion here is to contact the package maintainer, with no guarantee of getting any useful response, or find a forum way more specific than R HELP just because part of the package is in R.
>
> As it happens, the lda() function being discussed may (or may not) be in the MASS package. Looking at the documentation, I saw no obvious hook to show it as it makes progress. Of course Akshay can do some external testing using standard R timing mechanisms to see how long it takes to do just some of the news categories without going in to the details of the function called and that might partially answer his question. Asking how to do that might fit the parameters here.
>
>
> -----Original Message-----
> From: Bert Gunter <bgunter.4567 using gmail.com>
> To: akshay kulkarni <akshay_e4 using hotmail.com>
> Cc: R help Mailing list <r-help using r-project.org>
> Sent: Sat, Jan 29, 2022 3:34 pm
> Subject: Re: [R] progress of LDA algorithm...
>
>
> I presume this is in some specialized package that you have not told
> us about -- topicmodels maybe? It is therefore off topic here. In any
> case, this is the sort of question for which you should contact the
> package maintainer (?maintainer).
>
> As your question may also intersect with high performance computing
> considerations, you might want to post it on the R-Sig-HPC list,
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sat, Jan 29, 2022 at 8:27 AM akshay kulkarni <akshay_e4 using hotmail.com> wrote:
> >
> > dear members,
> > I want to run LDA(latent Dirichlet allocation) on certain news articles. i have the following questions:
> >
> >
> > 1. Is there any way to know the progress of the execution of the LDA algorithm?
> > 2. I read in SO that if you have more memory, faster is the execution time of LDA. I am using AWS z1d instance with 48 cores and about 325 GB RAM. I have multiple categories of news, but one of them is much larger than others, containing about 25000 articles. Is it preferable to send those categories individually to different processors, and whether R frees up the memory after running on the smaller categories so that the largest category can run with more memory? Or is it preferable to first run the smaller sets, finish the job, and then run the largest category?
> >
> > Thanking You,
> > Yours sincerely,
> > AKSHAY M KULKARNI
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list