[R] How to suppress errors from htmlTreeParse() function in XML package?

Duncan Temple Lang duncan at wald.ucdavis.edu
Tue Nov 4 20:18:15 CET 2008



Martin Morgan wrote:
> Hi Tony --
> 
> Tony Breyal <tony.breyal at googlemail.com> writes:
> 
>> Dear R-help,
>>
>> The following code downloads an html document into variable 'doc' and
>> then stores an internal representation into variable 'html.tree'. Even
>> if the html code is malformed, this still works which is fantastic.
>> However, as in the example below, i do get some ouput from R in the
>> console which i would like to suppress somehow, so i can keep my
>> window a bit cleaner.
>>
>> I understand that the output is just letting me know that the html
>> code is malformed, but for my purposes i can ignore that output. Is
>> there a way to achieve this?


Yep.  The error parameter of the htmlTreeParse() (or htmlParse())
function let's you control how to handle errors and "warnings" the XML
parser raises.

So the no-op function will discard them

  htmlTreeParse(doc, useInternalNodes = TRUE, error = function(...){})

That should do it.

   D


>>
>> ### Example:
>> library(RCurl); library(XML)
>> doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project
>> %22&as_qdr=d1&num=100')
>> html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)
> 
> How about capture.output
> 
> res <- capture.output(html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE))
> 
> Martin
> 
>> ### Output - this is what i would like to suppress
>> Tag nobr invalid
>> htmlParseEntityRef: expecting ';'
>> htmlParseEntityRef: expecting ';'
>> ### etc.
>>
>> I attempted to use try(expr, silent=TRUE) but that didn't work for me:
>>>  try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE)
>>
>> Many thanks in advance for any help,
>> Tony Breyal
>>
>>
>> ### O/S = Windows Vista Ultimate ###
>>> sessionInfo()
>> R version 2.8.0 (2008-10-20)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
>> 1252;LC_MONETARY=English_United Kingdom.
>> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods
>> base
>>
>> other attached packages:
>> [1] XML_1.98-1   RCurl_0.91-0
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list