[R] Web scraping - Having trouble figuring out how to approach this problem

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Thu Feb 23 19:03:42 CET 2017


The answer is yes, and does not seem like a big step from where you are now, so seeing what you already know how to do (reproducible example, or RE) would help focus the assistance. There are quite a few ways to do this kind of thing, and what you already know would be clarified with a RE.
-- 
Sent from my phone. Please excuse my brevity.

On February 22, 2017 2:52:55 PM PST, henrique monte <henrique.monte66 at gmail.com> wrote:
>Sometimes I need to get some data from the web organizing it into a
>dataframe and waste a lot of time doing it manually. I've been trying
>to
>figure out how to optimize this proccess, and I've tried with some R
>scraping approaches, but couldn't get to do it right and I thought
>there
>could be an easier way to do this, can anyone help me out with this?
>
>Fictional example:
>
>Here's a webpage with countries listed by continents:
>https://simple.wikipedia.org/wiki/List_of_countries_by_continents
>
>Each country name is also a link that leads to another webpage
>(specific of
>each country, e.g. https://simple.wikipedia.org/wiki/Angola).
>
>I would like as a final result to get a data frame with number of
>observations (rows) = number of countries listed and 4 variables
>(colums)
>as ID=Country Name, Continent=Continent it belongs to,
>Language=Official
>language (from the specific webpage of the Countries) and Population =
>most
>recent population count (from the specific webpage of the Countries).
>
>...
>
>The main issue I'm trying to figure out is handling several webpages,
>like,
>would it be possible to scrape from the first link of the problem the
>countries as a list with the links of the countries webpages and then
>create and run a function to run a scraping command in each of those
>links
>from the list to get the specific data I'm looking for?
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list