[R] general question on approaches to getting data from data providers

Mike Marchywka marchywka at hotmail.com
Tue Feb 1 16:10:15 CET 2011



My question, buried in this rant, is " is there a mail list
or other means for identifying sites with information likely
to be important to many R users but the data is difficult to obtain
due to the site's choice of technology?"

Quite often, people here ask questions about scraping html
to get various types of "public" information ( public being a bit
debatable when information is buried in formatting junk). 
At least in one case,  I think it was something financial,
I noted that R has packages with large components dedicated to
scraping data from both gov and com sources but there is no indication
that they are working with cooperative groups on the other side
of the information fence. This morning, I tried to contact the
census.gov webmaster after noting that all their data is in xls
when in fact csv would probably be more appropriate for the
data they have- I can open csv easily in notepad LOL. 
Then of course they point you to a certain company
that makes a product to read this stuff. 

Is there a different list or general community that has a charter for
discussing ways to get computer readable data from "data" providers? 
There are many websites that create other things, like fancy PDF graphics,
that obliterate data or try to lock you into one commercial or proprietary
or limited tool chain for data analysis.

Thanks.




 		 	   		  


More information about the R-help mailing list