library(BIEN) library(ape) #Package for working with phylogenies in R library(maps) #Useful for making quick maps of occurrences library(sp) # A package for spatial data
We try to make this package as easy and intuitive to use as possible, but it is still often easiest to start with our vignette. Particularly useful are the “Function Names” and “Function Directory” sections.
The function names follow a consistent naming strategy, and mostly consist of 3 parts:
As a complete example, the function
BIEN_occurrence_species returns occurrence records for a given species (or set of species).
Currently we have 9 function families in RBIEN. These are sets of functions that access a given type of data.
We'll walk through each of the function families and take a look at some the options available within each.
These functions begin with the prefix
BIEN_occurrence_... and allow you to query occurrences by either taxonomy or geography. Functions include:
BIEN_occurrence_country Returns all occurrence records within a given country
BIEN_occurrence_state Returns all occurrences records within a given state/province
BIEN_occurrence_county Returns all occurrences records within a given state/province
BIEN_occurrence_family Returns all occurrence records for a specified family
BIEN_occurrence_genus Returns all occurrence records for a specified genus
BIEN_occurrence_species Returns all occurrence records for a specified species
Each of these functions has a number of different arguments that modify your query, either refining your search criteria or returning more data for each record. These arguments include:
TRUE, records known to be cultivated will be returned.
TRUE, records returned are limited to those in North and South America, where greater data cleaning and validation has been done.
TRUE, the query will return additional taxonomic data, including the uncorrected taxonomic information for those records.
TRUE, additional information will be returned regarding whether a species is native in a given region.
TRUE, the default, information for occurrences flagged as introduced will not be returned.
TRUE, the query will return whether each record is from either a plot or a specimen. This may be useful if a user believes one type of information may be more accurate.
TRUE, the query will return information on which country, state, etc. that an occurrence is found within.
TRUE, the quest will return additional information about the collection and identification of that specimen.
Example 1: Occurrence records for a species
Okay, enough reading. Let's get some data.
Let's say we're interested in the species Xanthium strumarium and we'd like some occurrence data. We'll use the function
BIEN_occurrence_species to grab the occurrence data.
Xanthium_strumarium <- BIEN_occurrence_species(species = "Xanthium strumarium")
Take a moment and view the dataframe and take a look at the structure
## 'data.frame': 3454 obs. of 11 variables: ## $ scrubbed_species_binomial : chr "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" ... ## $ latitude : num 32.7 32.7 37 -41.4 31.4 ... ## $ longitude : num -87.2 -87.2 21.8 174.9 -109.9 ... ## $ date_collected : Date, format: "2000-08-31" "2000-08-31" ... ## $ datasource : chr "CVS" "CVS" "GBIF" "GBIF" ... ## $ dataset : chr "Carolina Vegetation Survey" "Carolina Vegetation Survey" "B" "AK" ... ## $ dataowner : chr "Robert Peet" "Robert Peet" "B" "AK" ... ## $ custodial_institution_codes: chr NA NA "B" "AK" ... ## $ collection_code : chr NA NA "Herbarium Berolinense" "AK" ... ## $ datasource_id : num 48 48 3617 3722 3271 ... ## $ is_new_world : int 1 1 0 0 1 0 0 0 1 0 ...
## scrubbed_species_binomial latitude longitude date_collected datasource ## 1 Xanthium strumarium 32.69194 -87.23868 2000-08-31 CVS ## 2 Xanthium strumarium 32.69194 -87.23868 2000-08-31 CVS ## 3 Xanthium strumarium 37.02611 21.81917 2003-10-15 GBIF ## 4 Xanthium strumarium -41.41496 174.91683 <NA> GBIF ## 5 Xanthium strumarium 31.43330 -109.91700 1892-09-19 GBIF ## 6 Xanthium strumarium 37.54833 22.86278 2003-10-02 GBIF ## dataset dataowner custodial_institution_codes ## 1 Carolina Vegetation Survey Robert Peet <NA> ## 2 Carolina Vegetation Survey Robert Peet <NA> ## 3 B B B ## 4 AK AK AK ## 5 US US US ## 6 B B B ## collection_code datasource_id is_new_world ## 1 <NA> 48 1 ## 2 <NA> 48 1 ## 3 Herbarium Berolinense 3617 0 ## 4 AK 3722 0 ## 5 Botany 3271 1 ## 6 Herbarium Berolinense 3617 0
The default data that is returned consists of the latitude, longitude and date collected, along with a set of attribution data. The meaning of some of these columns is obvious (e.g. latitude, longitude), however others may be less clear. The meanings of these columns and the information within is explained in more detail in our data dictionary, available at http://bien.nceas.ucsb.edu/bien/tools/rbien/data-dictionary/
If we want more information on these occurrences, we just need to change the arguments:
Xanthium_strumarium_full <- BIEN_occurrence_species(species = "Xanthium strumarium",cultivated = T,only.new.world = F,all.taxonomy = T,native.status = T,observation.type = T,political.boundaries = T) str(Xanthium_strumarium_full)
## 'data.frame': 3736 obs. of 35 variables: ## $ scrubbed_species_binomial : chr "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" ... ## $ verbatim_family : chr "Asteraceae" "Asteraceae" "Asteraceae" "Asteraceae" ... ## $ verbatim_scientific_name : chr "Xanthium strumarium L." "Xanthium strumarium L." "Xanthium strumarium L." "Xanthium strumarium L." ... ## $ family_matched : chr "Asteraceae" "Asteraceae" "Asteraceae" "Asteraceae" ... ## $ name_matched : chr "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" ... ## $ name_matched_author : chr "L." "L." "L." "L." ... ## $ higher_plant_group : chr "flowering plants" "flowering plants" "flowering plants" "flowering plants" ... ## $ scrubbed_taxonomic_status : chr "Accepted" "Accepted" "Accepted" "Accepted" ... ## $ scrubbed_family : chr "Asteraceae" "Asteraceae" "Asteraceae" "Asteraceae" ... ## $ scrubbed_author : chr "L." "L." "L." "L." ... ## $ native_status : chr "UNK" "UNK" "P" "N" ... ## $ native_status_reason : chr "Status unknown, no checklists for region of observation" "Status unknown, no checklists for region of observation" "Present in one or more checklists for region, status not indicated" "Native to region, as per checklist" ... ## $ native_status_sources : chr NA NA "usda" "mexico" ... ## $ is_introduced : int NA NA 0 0 NA NA NA 0 NA NA ... ## $ native_status_country : chr NA NA "P" "N" ... ## $ native_status_state_province: chr NA NA "P" "N" ... ## $ native_status_county_parish : chr NA NA NA NA ... ## $ country : chr "Greece" "New Zealand" "United States" "Mexico" ... ## $ state_province : chr NA NA "Arizona" "Nuevo Leon" ... ## $ county : chr NA NA "Cochise" NA ... ## $ locality : chr "MessinÃa, NW Hatzis" "Wellington Ecological District/Sounds-Wellington Ecological Region/NZ Eco Region" "Bisbee, Mex. Bound. Line." "Rancho Aguililla" ... ## $ latitude : num 37 -41.4 31.4 25 37.5 ... ## $ longitude : num 21.8 174.9 -109.9 -100.6 22.9 ... ## $ date_collected : Date, format: "2003-10-15" NA ... ## $ datasource : chr "GBIF" "GBIF" "GBIF" "GBIF" ... ## $ dataset : chr "B" "AK" "US" "CNS-UT" ... ## $ dataowner : chr "B" "AK" "US" "CNS-UT" ... ## $ custodial_institution_codes : chr "B" "AK" "US" "CNS-UT" ... ## $ collection_code : chr "Herbarium Berolinense" "AK" "Botany" "TEX" ... ## $ datasource_id : num 3617 3722 3271 3043 3617 ... ## $ is_cultivated_observation : int 0 0 0 1 0 0 0 0 0 0 ... ## $ is_cultivated_in_region : int 0 0 0 0 0 0 0 0 0 0 ... ## $ is_location_cultivated : logi NA NA NA NA NA NA ... ## $ is_new_world : int 0 0 1 1 0 0 0 1 0 0 ... ## $ observation_type : chr "specimen" "specimen" "specimen" "specimen" ...
We now have considerably more information.
Let's take a quick look at where those occurrences are.
# Make a quick map to plot our points on map('world', fill = T, col= "grey", bg = "light blue") #Plot the points from the full query in red points(cbind(Xanthium_strumarium_full$longitude,Xanthium_strumarium_full$latitude),col = "red",pch = 20,cex = 1) # Plot the points from the default query in blue points(cbind(Xanthium_strumarium$longitude,Xanthium_strumarium$latitude),col = "blue",pch = 20,cex = 1)
From the map, we can see that the points from the default query (in blue) all fall within the New World. The points from the full query (red + blue) additionally include occurrences from the Old World.
Example 2: Occurrence records for a country
Since we may be interested in a particular geographic area, rather than a particular set of species, there are also options to easily extract data by political region as well.
We'll choose a relatively small region, the Bahamas, for our demonstration.
Bahamas <- BIEN_occurrence_country(country = "Bahamas") #Let's see how many species we have length(unique(Bahamas$scrubbed_species_binomial)) #About 300 species with valid occurrence records. #Now, let's take a look at where those occurrences are: map(regions = "Bahamas" ,fill = T , col= "grey", bg = "light blue") points(cbind(Bahamas$longitude,Bahamas$latitude),col = "blue",pch = 20,cex = 1) #Looks like some islands are considerably better sampled than others.
These functions begin with the prefix
BIEN_ranges_... and return (unsurprisingly) species ranges. Most of these functions work by saving the downloaded ranges to a specified directory in shapefile format, rather than by loading them into the R environment.
BIEN_ranges_species Downloads range maps for given species and save them to a specified directory.
BIEN_ranges_genus Saves range maps for all species within a genus to a specified directory.
BIEN_ranges_load_species This function returns the ranges for a set of species as a SpatialPolygonsDataFrame object.
The range functions have different arguments than we have seen so far, including:
directory This is where the function will be saving the shapefiles you download
TRUE, the function will return a dataframe listing which species ranges were downloaded and which weren't.
TRUE, the function will check whether a map is available for each species without actually downloading it
TRUE, the function will append a unique gid number to each range map's filename. This argument is designed to allow forward compatibility when BIEN contains multiple range maps for each species.
Example 3: Range maps and occurrence points
If we have a species we're interested in, and would like to load the range map into the environment, we can use the function
BIEN_ranges_load_species. Let's try this for Xanthium strumarium.
Xanthium_strumarium_range <- BIEN_ranges_load_species(species = "Xanthium strumarium")
The range map is now in our global environment as a SpatialPolygonsDataFrame. Let's plot the map and see what it looks like.
#First, let's add a base map so that our range has some context: map('world',fill = T , col= "grey", bg = "light blue",xlim = c(-180,-20),ylim = c(-60,80)) #Now, we can add the range map: plot(Xanthium_strumarium_range,col = "green",add = T)
Now, let's add those occurrence points from earlier to this map:
map('world',fill = T , col= "grey", bg = "light blue",xlim = c(-180,-20),ylim = c(-60,80)) plot(Xanthium_strumarium_range,col = "green",add = T) points(cbind(Xanthium_strumarium$longitude,Xanthium_strumarium$latitude),col = "blue",pch = 20,cex = 1)
These functions begin with the prefix “BIENplot” and return ecological plot data. Functions include:
BIEN_plot_list_sampling_protocol Returns the different plot sampling protocols found in the BIEN database.
BIEN_plot_list_datasource Returns the different datasources that are available in the BIEN database.
BIEN_plot_sampling_protocol Downloads data for a specified sampling protocol
BIEN_plot_datasource Downloads data for a specific datasource
BIEN_plot_dataset Downloads data for a given dataset (which is nested within a datasource)
BIEN_plot_name Downloads data for a specific plot name (these are nested within a given dataset)
Again we have some of the same arguments available for these queries that we saw for the occurrence functions. We also have the new argument
all.metadata, which causes the functions to return more metadata for each plot.
Example 4: Plot data by plot name
Let's take a look at the data for an individual plot.
LUQUILLO <- BIEN_plot_name(plot.name = "LUQUILLO") head(LUQUILLO)
We can see that this is a 0.1 hectare transect where stems >= 2.5 cm diameter at breast height were included. If we'd like more detail, we can use additional arguments:
LUQUILLO_full <- BIEN_plot_name(plot.name = "LUQUILLO",cultivated = T,all.taxonomy = T,native.status = T,political.boundaries = T,all.metadata = T)
LUQUILLO_full contains more useful information, including metadata on which taxa were included, which growth forms were included and information on whether species are known to be native or introduced.
These functions begin with the prefix
BIEN_trait_... and access the BIEN trait database. Note that the spelling of the trait names must be precise, so we recommend using the function
BIEN_trait_list first. Traits names are standardized to follow http://www.top-thesaurus.org/ where available. Trait units have been standardized for each trait.
BIEN_trait_list Start with this. It returns a dataframe of the traits available.
BIEN_trait_family Returns a dataframe of all trait data for a given family (or families).
BIEN_trait_trait Downloads all records of a specified trait (or traits).
BIEN_trait_mean Estimates species mean trait values using genus or family level means where species-level data is absent.
BIEN_trait_traitbyfamily Downloads data for a given family (or families) and trait(s).
Example 5: Accessing trait data
If you're interested in accessing all traits for a taxon, say the genus Salix, just go ahead and use the corresponding function:
Salix_traits <- BIEN_trait_genus(genus = "Salix")
If instead we're interested in a particular trait, the first step is to check if that trait is present and verify the spelling using the function
## trait_name ## 1 diameter at breast height (1.3 m) ## 2 flower color ## 3 flower pollination syndrome ## 4 fruit type ## 5 inflorescence length ## 6 leaf area ## 7 leaf area per leaf dry mass ## 8 leaf carbon content per leaf dry mass ## 9 leaf carbon content per leaf nitrogen content ## 10 leaf compoundness ## 11 leaf dry mass ## 12 leaf dry mass per leaf fresh mass ## 13 leaf fresh mass ## 14 Leaf lamina fracture toughness ## 15 leaf life span ## 16 leaf nitrogen content per leaf area ## 17 leaf nitrogen content per leaf dry mass ## 18 leaf phosphorus content per leaf area ## 19 leaf phosphorus content per leaf dry mass ## 20 leaf photosynthetic rate per leaf area ## 21 leaf photosynthetic rate per leaf dry mass ## 22 leaf relative growth rate ## 23 leaf stomatal conductance for H2O per leaf area ## 24 leaf thickness ## 25 longest whole plant longevity ## 26 maximum fruit length ## 27 maximum leaf length ## 28 maximum leaf width ## 29 maximum whole plant height ## 30 maximum whole plant longevity ## 31 minimum fruit length ## 32 minimum leaf length ## 33 minimum leaf width ## 34 minimum whole plant height ## 35 plant flowering begin ## 36 plant flowering duration ## 37 plant fruiting duration ## 38 root dry mass ## 39 seed length ## 40 seed mass ## 41 stem dry mass ## 42 stem relative growth rate ## 43 stem wood density ## 44 vessel lumen area ## 45 vessel number ## 46 whole plant dispersal syndrome ## 47 whole plant growth form ## 48 whole plant growth form diversity ## 49 whole plant height ## 50 whole plant primary juvenile period length ## 51 whole plant sexual system ## 52 whole plant vegetative phenology ## 53 whole plant woodiness ## 54 <NA>
If we're interested in leaf area, we see that this is indeed called “leaf area” in the database. Now that we know the proper spelling, we can use the function
BIEN_trait_trait to download all observations of that trait.
leaf_area <- BIEN_trait_trait(trait = "leaf area")
Note that the units have been standardized and that there is a full set of attribution data for each trait.
While there are existing packages that query taxonomic data (e.g. those included in the excellent taxize package), the RBIEN taxonomy functions access the taxonomic information that underlies the BIEN database, ensuring consistency.
BIEN_taxonomy_family Downloads all taxonomic information for a given family.
Example 6: Taxonomic data
Let's say we're interested in the genus Asclepias, and we'd like to get an idea of how many species there are in this genus and what higher taxa it falls within.
Asclepias_taxonomy <- BIEN_taxonomy_genus(genus = "Asclepias") #We see that the genus Asclepias falls within the family Apocynaceae and the order Gentianales. #You'll also notice that a given species may appear more than once (due to multiple circumscriptions, some of which may be illegitimate). #If we'd just like to know all the speciess that aren't illegitimate: Asclepias_species <- unique(Asclepias_taxonomy$scrubbed_species_binomial[Asclepias_taxonomy$scrubbed_taxonomic_status %in% c("accepted", "no opinion")])
The BIEN database currently contains 101 phylogenies for new world plants. This includes 100 replicated phylogenies that include a large fraction of New World plant species (“complete phylogenies”) and 1 phylogeny containing only those New World plant species for which molecular data were available (“conservative phylogeny”). Currently, there are only 2 functions available:
BIEN_phylogeny_complete This function will return a specified number of the replicated “complete” phylogenies. Note that each phylogeny is several Mb in size, so downloading many may take a while on slow connections.
BIEN_phylogeny_conservative This function returns the conservative phylogeny.
Arguments: The function
BIEN_phylogeny_complete has a few arguments that are worth explaining:
n_phylogenies This is the number of replicated phylogenies that you want to download (between 1 and 100)
seed This function sets the seed for the random number generator before randomly drawing the phylogenies to be downloaded. This is useful for replicating analyses.
replicates This function allows you to specify WHICH of the 100 phylogenies to download, rather than having them selected randomly.
Example 7: Phylogenies
Let's say we want to download the conservative phylogeny.
phylo <- BIEN_phylogeny_conservative() #Let's make sure it looks alright plot.phylo(x = phylo, show.tip.label = FALSE)
#If we just want to see which species are included phylo_species <- phylo$tip.label
The BIEN database contains stem data associated with many of the plots. This is typically either diameter at breast height or diameter at ground height. At present, there is only one stem function (although expect more in the future):
BIEN_stem_speciesThis function downloads all of the stem data for a given species (or set of species)
BIEN_stem_datasourceThis function downloads all of the stem data for a given datasource.
The arguments for this function are the same that we have seen in the occurrence and plot functions.
Example 8: Stem data
If we'd like stem data for the species Cupressus arizonica
Cupressus_arizonica_stems <- BIEN_stem_species("Cupressus arizonica")
These functions begin with the prefix
BIEN_list_ and allow you to quickly get a list of all the species in a geographic unit. Functions include:
BIEN_list_country Returns all species found within a country.
BIEN_list_state Returns all species found within a given state/province or other 2nd level political division.
BIEN_list_county Returns all species found within a given county/parish/or other 3rd level political division.
Some of the same arguments we saw in the occurrence functions appear here as well, including
Example 9: Species list for a country
Let's return to our previous example. What if we just need a list of the species in the Bahamas, rather than the specific details of each occurrence record? We can instead use the function
BIEN_list_country to download a list of species, which should be much faster than using
BIEN_occurrence_country to get a species list.
Bahamas_species_list <- BIEN_list_country(country = "Bahamas") #Notice that we find many more species listed than we found occurrence records for. What happened? There are many records coming from the Bahamas that lack coordinates. These records are used used in the "_list_" functions, but not the occurrence functions.
If we wanted to retrieve the results for multiple countries at once, that is simple as well. We just need to supply a vector of countries.
country_vector <- c("Haiti","Dominican Republic") Haiti_DR <- BIEN_list_country(country = country_vector)
We can also use political division codes (from geonames.org) instead of writing out the full country names.
#To see all of the political division names, and associated codes, we can use this function: political_names <- BIEN_metadata_list_political_names() #Let's take a look at what the dataframe contains: head(political_names)
## country country_iso state_province state_province_ascii state_code ## 1 Romania RO Olt Olt 29 ## 2 Romania RO JudeÅ£ul MaramureÅŸ Judetul Maramures 25 ## 3 Nigeria NG Sokoto State Sokoto State 51 ## 4 Norway NO Buskerud fylke Buskerud fylke 04 ## 5 Norway NO Ã\230stfold fylke Ostfold fylke 13 ## 6 Romania RO Satu Mare Satu Mare 32 ## county_parish county_parish_ascii county_code ## 1 Comuna Brebeni Comuna Brebeni 125999 ## 2 Comuna CoaÅŸ Comuna Coas 179837 ## 3 Tureta Tureta 620 ## 4 FlÃ¥ Fla 0615 ## 5 RÃ¥de Rade 0135 ## 6 Comuna BÃ¢rsÄƒu Comuna Barsau 137103
#In addition to the standardized country, state (state_province_ascii) and county (county_parish_ascii) names, we have the associated codes that can be used in BIEN functions. #Note that 'state' refers to any primary political division (e.g. province), and 'county' refers to any secondary political division (e.g. parish). #Looking at the political_names dataframe, we see that the Dominican Republic has country code "DO", and Haiti has country code "HT" Haiti_DR_from_codes <- BIEN_list_country(country.code = c("HT","DO"))
The BIEN metadata functions start with the prefix
BIEN_metadata_... and provide useful metadata for the BIEN database.
BIEN_metadata_database_version Returns the current version number of the BIEN database and the release date.
BIEN_metadata_match_data Rudimentary function to check for changed records between old and current queries.
BIEN_metadata_citation Function to generate bibtex citations for use in reference managers.
BIEN_metadata_list_political_names Returns a dataframe containing political division names and associate codes.
Example 10: Metadata
To check what the current version of the BIEN database is (which we recommend reporting when using BIEN data):
## db_version db_release_date ## 1 4.1.1 2018-12-06
Example 11: Citations
One of the more innovative features of the BIEN package is that it will generate custom attribution data for you based on what data you downloaded through the package.
Let's say we're interested in Selaginella selaginoides, and we'd like to download some occurrence data:
Selaginella_selaginoides_occurrences <- BIEN_occurrence_species("Selaginella selaginoides",only.new.world = F)
If we plan on using those data in a publication ,we'll need proper attribution. We can use
BIEN_metadata_citation to do this for us:
citation_info <- BIEN_metadata_citation(dataframe = Selaginella_selaginoides_occurrences)
citation_info is a list that contains 3 elements: 1. A bit of general information on how to use the list. 2. A set of bibtex formatted references. 3. Acknowledgement text.
To make things even easier on ourselves, we can use some of the additional functionality of the
temp_dir <- file.path(tempdir(), "BIEN_temp") #Set a temporary working directory citation_info <- BIEN_metadata_citation(dataframe = Selaginella_selaginoides_occurrences, bibtex_file = file.path(temp_dir,"selaginella_selaginoides.bib"), acknowledgement_file = file.path(temp_dir,"selaginella_selaginoides.txt"))
Now, we have a bibtex file,
selaginella_selaginoides.bib, that can be loaded into a reference manager (e.g. Endnote, Paperpile, etc.), and a text file,
selaginella_selaginoides.txt, containing text that can be pasted into the acknowledgements section of a publication.
What if we also have some trait data? No problem there, the code handles that as well:
#First, let's get some trait data: selaginella_selaginoides_traits <- BIEN_trait_species(species = "Selaginella selaginoides") #Now, we just need to modify our previous bit of code to include the trait data as well: temp_dir <- file.path(tempdir(), "BIEN_temp") citation_info <- BIEN_metadata_citation(dataframe = Selaginella_selaginoides_occurrences, trait.dataframe = selaginella_selaginoides_traits, bibtex_file = file.path(temp_dir,"selaginella_selaginoides.bib"), acknowledgement_file = file.path(temp_dir,"selaginella_selaginoides.txt"))
The updated citation information will now contain references for both trait and occurrence records.
*Example 11: Putting it all together * Coming soon!