[R-SIG-Mac] xmlTreeParse
Simon Urbanek
simon.urbanek at r-project.org
Fri Apr 20 18:58:18 CEST 2012
On Apr 18, 2012, at 12:31 AM, John Maindonald wrote:
> Thanks, Simon.
>
> XML installs just fine. The problem comes when I do:
> [R version 2.15.0 (2012-03-30)]
>
>>> library(XML)
>>> url <- "http://stats.grok.se/en/201204/Financial_crisis"
>> wikitree <- xmlTreeParse(url, useInternalNodes=T)
>> url <- "http://stats.grok.se/en/201204/Financial_crisis"
>>> wikitree <- xmlTreeParse(url, useInternalNodes=T)
>> AttValue: " or ' expected
>> attributes construct error
>> Couldn't find end of Start Tag link line 11
>> Opening and ending tag mismatch: hr line 57 and body
>> Opening and ending tag mismatch: body line 48 and html
>> Premature end of data in tag html line 3
>> Error: 1: AttValue: " or ' expected
>> 2: attributes construct error
>> 3: Couldn't find end of Start Tag link line 11
>> 4: Opening and ending tag mismatch: hr line 57 and body
>> 5: Opening and ending tag mismatch: body line 48 and html
>> 6: Premature end of data in tag html line 3
>
> This pinpoints the error that I get when I try to run the function wikiStat() from:
> http://expansed.com/2011/08/visualising-wikipedia-search-statistics-with-r/
>
The problem is that the data source has changed so the page is no longer in XML format so you can't parse it as XML. It is a HTML page so there is not much useful you can do with it from R -- there are separate URLs for data formats.
Cheers,
Simon
>> critraffic <- wikiStat("Financial_crisis", monback = 40)
>> Loading required package: mondate
>>
>> Attaching package: 'mondate'
>>
>> The following object(s) are masked from 'package:base':
>>
>> cbind, rbind
>>
>> AttValue: " or ' expected
>> attributes construct error
>> Couldn't find end of Start Tag link line 11
>> Opening and ending tag mismatch: hr line 57 and body
>> Opening and ending tag mismatch: body line 48 and html
>> Premature end of data in tag html line 3
>> Error: 1: AttValue: " or ' expected
>> 2: attributes construct error
>> 3: Couldn't find end of Start Tag link line 11
>> 4: Opening and ending tag mismatch: hr line 57 and body
>> 5: Opening and ending tag mismatch: body line 48 and html
>> 6: Premature end of data in tag html line 3
>
>
> I can use Safari to access the url
> http://stats.grok.se/en/201204/Financial_crisis
> without problem.
>
> Incidentally, I get the same result under Windows 7 (R-2.14.2 or R-2.15.0).
> If I am right in thinking that I have to sort out issues of 3rd party libraries to get the
> function to run, I prefer to do this for OS 10.7.x, if at all possible.
>
> It is of course possible that there has been some change at the url since wikiStat()
> was posted, of which I need to take account.
>
> NB also a later article that uses this function:
> http://expansed.com/2011/08/r-popularity-steady-growth-and-new-york-times/
> ( ar <- wikiStat("R_(programming_language)", monback = 45, lang= 'en') )
>
> John Maindonald email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
> On 18/04/2012, at 12:10 PM, Simon Urbanek wrote:
>
>> John,
>>
>> what's wrong with
>> install.packages("XML")
>> library(XML)
>> It works just fine on Lion ...
>>
>> Cheers,
>> Simon
>>
>>
>> On Apr 17, 2012, at 9:51 PM, John Maindonald wrote:
>>
>>> I am looking for clues on getting xmlTreeParse(), from the XML package,
>>> working under Lion. The help page has the note:
>>> 'Make sure that the necessary 3rd party libraries are available.'
>>>
>>> I take it that these are the libraries that are noted at:
>>> http://www.explain.com.au/oss/libxml2xslt.html
>>>
>>> The page does not have any details for anything past Leopard. Any
>>> comments on what is needed for Lion will be helpful. I'd been hoping
>>> to run the code for the function wikiStat() that is given at:
>>> http://expansed.com/2011/08/visualising-wikipedia-search-statistics-with-r/
>>>
>>> I will be grateful for any clues.
>>>
>>> John Maindonald email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>> http://www.maths.anu.edu.au/~johnm
>>>
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>
>>>
>>
>
>
More information about the R-SIG-Mac
mailing list