[R-SIG-Mac] xmlTreeParse

Simon Urbanek simon.urbanek at r-project.org
Fri Apr 20 18:58:18 CEST 2012


On Apr 18, 2012, at 12:31 AM, John Maindonald wrote:

> Thanks, Simon.
> 
> XML installs just fine.  The problem comes when I do:
> [R version 2.15.0 (2012-03-30)]
> 
>>> library(XML)
>>> url <- "http://stats.grok.se/en/201204/Financial_crisis"
>> wikitree <- xmlTreeParse(url, useInternalNodes=T)
>> url <- "http://stats.grok.se/en/201204/Financial_crisis"
>>> wikitree <- xmlTreeParse(url, useInternalNodes=T)
>> AttValue: " or ' expected
>> attributes construct error
>> Couldn't find end of Start Tag link line 11
>> Opening and ending tag mismatch: hr line 57 and body
>> Opening and ending tag mismatch: body line 48 and html
>> Premature end of data in tag html line 3
>> Error: 1: AttValue: " or ' expected
>> 2: attributes construct error
>> 3: Couldn't find end of Start Tag link line 11
>> 4: Opening and ending tag mismatch: hr line 57 and body
>> 5: Opening and ending tag mismatch: body line 48 and html
>> 6: Premature end of data in tag html line 3
> 
> This pinpoints the error that I get when I try to run the function wikiStat() from:
> http://expansed.com/2011/08/visualising-wikipedia-search-statistics-with-r/
> 

The problem is that the data source has changed so the page is no longer in XML format so you can't parse it as XML. It is a HTML page so there is not much useful you can do with it from R -- there are separate URLs for data formats.

Cheers,
Simon


>> critraffic <- wikiStat("Financial_crisis", monback = 40)
>> Loading required package: mondate
>> 
>> Attaching package: 'mondate'
>> 
>> The following object(s) are masked from 'package:base':
>> 
>>    cbind, rbind
>> 
>> AttValue: " or ' expected
>> attributes construct error
>> Couldn't find end of Start Tag link line 11
>> Opening and ending tag mismatch: hr line 57 and body
>> Opening and ending tag mismatch: body line 48 and html
>> Premature end of data in tag html line 3
>> Error: 1: AttValue: " or ' expected
>> 2: attributes construct error
>> 3: Couldn't find end of Start Tag link line 11
>> 4: Opening and ending tag mismatch: hr line 57 and body
>> 5: Opening and ending tag mismatch: body line 48 and html
>> 6: Premature end of data in tag html line 3
> 
> 
> I can use Safari to access the url
> http://stats.grok.se/en/201204/Financial_crisis
> without problem.
> 
> Incidentally, I get the same result under Windows 7 (R-2.14.2 or R-2.15.0).
> If I am right in thinking that I have to sort out issues of 3rd party libraries to get the
> function to run, I prefer to do this for OS 10.7.x, if at all possible.
> 
> It is of course possible that there has been some change at the url since wikiStat()
> was posted, of which I need to take account.
> 
> NB also a later article that uses this function:
> http://expansed.com/2011/08/r-popularity-steady-growth-and-new-york-times/
> ( ar <- wikiStat("R_(programming_language)", monback = 45, lang= 'en') )
> 
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
> 
> On 18/04/2012, at 12:10 PM, Simon Urbanek wrote:
> 
>> John,
>> 
>> what's wrong with
>> install.packages("XML")
>> library(XML)
>> It works just fine on Lion ...
>> 
>> Cheers,
>> Simon
>> 
>> 
>> On Apr 17, 2012, at 9:51 PM, John Maindonald wrote:
>> 
>>> I am looking for clues on getting xmlTreeParse(), from the XML package,
>>> working under Lion.  The help page has the note:
>>> 'Make sure that the necessary 3rd party libraries are available.'
>>> 
>>> I take it that these are the libraries that are noted at:
>>> http://www.explain.com.au/oss/libxml2xslt.html
>>> 
>>> The page does not have any details for anything past Leopard.  Any
>>> comments on what is needed for Lion will be helpful.  I'd been hoping
>>> to run the code for the function wikiStat() that is given at:
>>> http://expansed.com/2011/08/visualising-wikipedia-search-statistics-with-r/
>>> 
>>> I will be grateful for any clues.
>>> 
>>> John Maindonald             email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>> http://www.maths.anu.edu.au/~johnm
>>> 
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>> 
>>> 
>> 
> 
> 



More information about the R-SIG-Mac mailing list