[R] splitting scientific names into genus, species, and subspecies
Chris Stubben
stubben at lanl.gov
Wed Nov 4 23:19:32 CET 2009
Mark W. Miller wrote:
>
> I have a list of scientific names in a data set. I would like to split
> the names into genus, species and subspecies. Not all names include a
> subspecies. Could someone show me how to do this?
>
strsplit should work for your example...
data.frame(
genus=sapply(strsplit(aa, " "), "[", 1),
species=sapply(strsplit(aa, " "), "[", 2),
subspecies=sapply(strsplit(aa, " "), "[", 3) ## will be NA for missing
subsp
)
However, scientific names are often pretty messy - I often have datasets
like this...
x
[1] "Aquilegia caerulea James var. caerulea"
[2] "Aquilegia caerulea James var. ochroleuca Hook."
[3] "Aquilegia caerulea James var. pinetorum (Tidestrom) Payson ex Kearney
& Peebles"
[4] "Aquilegia caerulea James"
[5] "Aquilegia chaplinei Standl."
[6] "Aquilegia chaplinei Standley ex Payson"
[7] "Aquilegia chrysantha Gray var. chrysantha"
[8] "Aquilegia chrysantha Gray"
So I first strip out author names using strsplit and use grep to find
subspecies/variety abbreviations
noauthor<-function(x){
## split name into vector of separate words
y<-strsplit(x, " ")
sapply(y, function(x){
n<-grep( "^var\\.$|^ssp\\.$|^var$|^f\\.$",x)
# apply a function to paste together the first and second elements
# plus element after matching var., spp., f. (or and others)
# use sort in case the name includes both var and spp -sometimes happens
paste( x[sort(c(1:2, n,n+1))], collapse=" ") })}
noauthor(x[1:8])
[1] "Aquilegia caerulea var. caerulea"
[2] "Aquilegia caerulea var. ochroleuca"
[3] "Aquilegia caerulea var. pinetorum"
[4] "Aquilegia caerulea"
[5] "Aquilegia chaplinei"
[6] "Aquilegia chaplinei"
[7] "Aquilegia chrysantha var. chrysantha"
[8] "Aquilegia chrysantha"
Chris
--
View this message in context: http://old.nabble.com/splitting-scientific-names-into-genus%2C-species%2C-and-subspecies-tp26204666p26205654.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list