[R] retrieve certain part from html

Henrique Dallazuanna wwwhsd at gmail.com
Wed Sep 23 14:39:43 CEST 2009


Try using XML package:

Lines <- "<td><a href='2005-01.html'>2005-01</a></td><td><a
href='2006-01.html'>2006-01</a></td><td><a
href='2007-01.html'>2007-01</a></td><td><a
href='2008-01.html'>2008-01</a></td><td><a
href='2009-01.html'>2009-01</a></td>"

library(XML)
xpathApply(htmlParse(Lines), "//a", xmlAttrs)

On Wed, Sep 23, 2009 at 9:29 AM, Rene <kaixinmalea at gmail.com> wrote:
> Dear All,
>
>
>
> Can someone please guide me how to get the certain part from a long html
> language?
>
>
>
> e.g.
>
>
>
> "<td><a href='2005-01.html'>2005-01</a></td><td><a
> href='2006-01.html'>2006-01</a></td><td><a
> href='2007-01.html'>2007-01</a></td><td><a
> href='2008-01.html'>2008-01</a></td><td><a
> href='2009-01.html'>2009-01</a></td>"
>
>
>
> How to get only the wording of  "2005-01.html", "2006-01.html",
> "2007-01.html"," 2008-01.html"," 2009-01.html" from the above html code? I
> have tried to use gsub function, but not working.
>
>
>
> Please guide me on this.
>
>
>
> Thanks a lot.
>
> Rene.
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O




More information about the R-help mailing list