[R-sig-eco] Webscraping the Plants Database

Sarah Goslee sarah.goslee at gmail.com
Wed Jan 2 23:42:11 CET 2013


Hi Tim,

There's no need to scrape and parse: check out the Download PLANTS
database link on the left side of plants.usda.gov

Sarah

On Wed, Jan 2, 2013 at 5:48 PM, Tim Seipel <t.seipel at env.ethz.ch> wrote:
> Dear Listserv,
>
> My aim is to compile plant traits for a list of species from the USDA
> Plants database.
> Examples from the list include
>
> Poa pratensis
> Festuca idahoensis
> Astragalus miser
>
> In R, I started with this:
>
> library(RCurl)
> ###############################################
> plants<-'http://plants.usda.gov/java/nameSearch?'
> ###############################################
> url<-paste('mode=','sciname','&keywordquery=','Festuca baffinensis',sep='')
> sp.url<-paste(plants,url,sep='')
>
> ###the link goes to the correct webpage
> http://plants.usda.gov/java/nameSearch?mode=sciname&keywordquery=Festuca
> baffinensis
> <http://plants.usda.gov/java/nameSearch?mode=sciname&keywordquery=Festuca%20baffinensis>
>
> p1<-getURL(sp.url)
>
>
> I would like to extract the following text from the page:
> Symbol:                 FEBA
> Group:          Monocot
> Family:                 Poaceae
> Duration:               Perennial
> Growth Habit: <http://plants.usda.gov/java/nameSearch#>                 Graminoid
> Native Status: <http://plants.usda.gov/java/nameSearch#>
> L48             N
> AK              N
> CAN             N
> GL              N
>
>
>
> However I can't seem to find it after parsing the string? Is this
> related to Java script?
>
> Can someone help me extract this information.
> Thanks for the help!
>
> Sincerely,
> Tim Seipel
>

--
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-sig-ecology mailing list