[R] XML: Slower parsing over time with htmlTreeParse()

Yihui Xie xie at yihui.name
Mon Mar 15 22:25:09 CET 2010


So you are parsing a *URL* instead of a local HTML file? I guess it
might have something to do with your internet connection as well as
the web server for that URL (some servers may restrict your access if
you visit it too frequently). Can you provide a reproducible example?

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-6609 Web: http://yihui.name
Department of Statistics, Iowa State University
3211 Snedecor Hall, Ames, IA



On Mon, Mar 15, 2010 at 7:09 AM, Janko Thyson
<janko.thyson at ku-eichstaett.de> wrote:
> Sorry, I listed the wrong package in the header of my previous post!
>
>
>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>
>
>
> Dear List,
>
>
>
> has anyone of you experienced a significant increase in the time it takes to
> parse an URL via "htmlTreeParse()" when this function is called repeatedly
> every minute over a couple of hours?
>
>
>
> Initially, a single parse takes about 0.5 seconds on my machine (Quad Core,
> 2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to
> 15 seconds or more.
>
>
>
> I've tried garbage collect, "catalogClearTable()" (though I don't think that
> has anything to do with the issue) and lately wondered if it maybe had to do
> with the accumulation of errors over time ("xmlErrorCumulator()"). Are
> parsing errors cumulated globally in the workspace over distinct calls to
> this function? If so, is there a way to "clean the buffer"?
>
>
>
> I would greatly appreciate if anyone had an idea about how to keep
> request/parsing time fairly constant at the initial low level of 0.5
> seconds.
>
>
>
> Thanks a lot,
>
> Janko
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list