[R] how to loop or lapply over a "XMLNodeSet" object with condition (if else)
Omar André Gonzáles Díaz
oma.gonzales at gmail.com
Sun May 24 01:08:52 CEST 2015
Hi, R-Help members,
I'm doing some webscraping. This time i need the image (url) of the
products of an ecommerce.
I can get the nodes where the urls are, but when trying to extract the URL,
i need to take 1
additional step:
"src" vs "data-original": in the source code, some urls are in the "src"
attribute, while others in the "data-original" attribute.
How to make a loop of an apply function to: if node element contains
"data-original" do:
... %>%
html_attr("data-original")
else do:
... %>%
html_attr("src")
The result should be a vector with the urls.
My code:
1.- I can get the nodes for the images:
##########################################################
#This result in a "XMLNodeSet" object
library(rvest)
PCs <- html("http://www.linio.cl/computacion/pc-escritorio/") %>%
html_nodes(".product-item-img") %>%
html_nodes("img")
###########################################################
#for the attr "data-original"
PCs2 <- html("http://www.linio.cl/computacion/pc-escritorio/") %>%
html_nodes(".product-item-img") %>%
html_nodes("img") %>%
html_attr("data-original")
Gives the urls for the attr "data-original", and NAs where there isn't this
attr.
#for the attr "src"
PCs3 <- html("http://www.linio.cl/computacion/pc-escritorio/") %>%
html_nodes(".product-item-img") %>%
html_nodes("img") %>%
html_attr("src")
Gives the content for the "src" attr. How ever, in some products the url
needed is in the "data-original" attr, and not here.
#### combination throwing NAs as result #####
PCs4 <- html("http://www.linio.cl/computacion/pc-escritorio/") %>%
html_nodes(".product-item-img") %>%
html_nodes("img") %>%
html_attr("data-original|src")
################################################
I've also tried something like this:
lapply(PCs, function(e) {
if ("data-original" %in% i) {
print("ok")
}
})
but get this:
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
Thanks.
[[alternative HTML version deleted]]
More information about the R-help
mailing list