[R] Extract just some fields from XML
Gorjanc Gregor
Gregor.Gorjanc at bfro.uni-lj.si
Sun May 8 18:29:25 CEST 2005
Hello!
I am trying to get specific fields from an XML document and I am totally
puzzled. I hope someone can help me.
# URL
URL<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation"
# download a XML file
tmp <- xmlTreeParse(URL, isURL = TRUE)
tmp <- xmlRoot(tmp)
Now I want to extract only node 'pubdate' and its children, but I don't
know how to do that unless I try to dig into the structure of the XML
file. The problem is that structure can differ and then hardcoded set
of list indices i.e. tmp[[i]][[j]]... doesn't help me.
I've read xmlEventParse but I don't understand handlers part up to the
point that I could get anything usable from it. Here is something not
very usable ;)
PubDate <- function(x, ...)
{
print(x)
}
xmlEventParse(URL, isURL = TRUE,
handlers=list(PubDate=PubDate),
addContext = FALSE)
Thanks in advance!
Lep pozdrav / With regards,
Gregor Gorjanc
----------------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3 tel: +386 (0)1 72 17 861
SI-1230 Domzale fax: +386 (0)1 72 17 888
Slovenia, Europe
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
you have no certainty until you try." Sophocles ~ 450 B.C.
More information about the R-help
mailing list