[R] Extract Data from a Webpage
Chuck Cleland
ccleland at optonline.net
Wed Dec 17 01:11:20 CET 2008
Hi All:
I would like to extract the provider name, address, and phone number
from multiple webpages like this:
http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490
Based on searching R-help archives, it seems like the XML package
might have something useful for this task. I can load the XML package
and supply the url as an argument to htmlTreeParse(), but I don't know
how to go from there.
thanks,
Chuck Cleland
> sessionInfo()
R version 2.8.0 Patched (2008-12-04 r47066)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_1.98-1
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
More information about the R-help
mailing list