[R] Extract Data from a Webpage

Chuck Cleland ccleland at optonline.net
Wed Dec 17 01:11:20 CET 2008


Hi All:
  I would like to extract the provider name, address, and phone number
from multiple webpages like this:

http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490

  Based on searching R-help archives, it seems like the XML package
might have something useful for this task.  I can load the XML package
and supply the url as an argument to htmlTreeParse(), but I don't know
how to go from there.

thanks,

Chuck Cleland

> sessionInfo()
R version 2.8.0 Patched (2008-12-04 r47066)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_1.98-1

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list