[R] splitting scientific names into genus, species, and subspecies

Chris Stubben stubben at lanl.gov
Wed Nov 4 23:19:32 CET 2009



Mark W. Miller wrote:
> 
> I have a list of scientific names in a data set.  I would like to split
> the names into genus, species and subspecies.  Not all names include a
> subspecies.  Could someone show me how to do this?
> 

strsplit should work for your example...

data.frame( 
  genus=sapply(strsplit(aa, " "), "[", 1),
species=sapply(strsplit(aa, " "), "[", 2),
subspecies=sapply(strsplit(aa, " "), "[", 3)   ## will be NA for missing
subsp 
 ) 

However, scientific names are often pretty messy - I often have datasets
like this...
x
 [1] "Aquilegia caerulea James var. caerulea"                                         
 [2] "Aquilegia caerulea James var. ochroleuca Hook."                                 
 [3] "Aquilegia caerulea James var. pinetorum (Tidestrom) Payson ex Kearney
& Peebles"
 [4] "Aquilegia caerulea James"                                                       
 [5] "Aquilegia chaplinei Standl."                                                    
 [6] "Aquilegia chaplinei Standley ex Payson"                                         
 [7] "Aquilegia chrysantha Gray var. chrysantha"                                      
 [8] "Aquilegia chrysantha Gray"       

So I first strip out author names using strsplit and use grep to find
subspecies/variety abbreviations 

noauthor<-function(x){
  ## split name into vector of separate words
  y<-strsplit(x, " ")
  sapply(y, function(x){  
        n<-grep( "^var\\.$|^ssp\\.$|^var$|^f\\.$",x)
# apply a function to paste together the first and second elements
# plus element after matching var., spp., f. (or and others) 
# use sort in case the name includes both var and spp -sometimes happens
        paste( x[sort(c(1:2, n,n+1))], collapse=" ")  })}


noauthor(x[1:8])
[1] "Aquilegia caerulea var. caerulea"    
[2] "Aquilegia caerulea var. ochroleuca"  
[3] "Aquilegia caerulea var. pinetorum"   
[4] "Aquilegia caerulea"                  
[5] "Aquilegia chaplinei"                 
[6] "Aquilegia chaplinei"                 
[7] "Aquilegia chrysantha var. chrysantha"
[8] "Aquilegia chrysantha"    


Chris






-- 
View this message in context: http://old.nabble.com/splitting-scientific-names-into-genus%2C-species%2C-and-subspecies-tp26204666p26205654.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list