[R] splitting scientific names into genus, species, and sub
(Ted Harding)
Ted.Harding at manchester.ac.uk
Wed Nov 4 22:37:08 CET 2009
On 04-Nov-09 21:09:42, Mark W. Miller wrote:
> I have a list of scientific names in a data set. I would like
> to split the names into genus, species and subspecies.
> Not all names include a subspecies. Could someone show me how
> to do this?
>
> My example code is:
> a <- matrix(c('genusA speciesA', 10,
> 'genusB speciesAA', 20,
> 'genusC speciesAAA subspeciesA', 15,
> 'genusC speciesAAA subspeciesB', 25), nrow=4, byrow=TRUE)
> aa <- data.frame(a)
> colnames(aa) <- c('species', 'counts')
> aa
>
># The code returns
> species counts
> 1 genusA speciesA 10
> 2 genusB speciesAA 20
> 3 genusC speciesAAA subspeciesA 15
> 4 genusC speciesAAA subspeciesB 25
>
># I would like there to be 4 columns as below
> genus species subspecies counts
> genusA speciesA no.subspecies 10
> genusB speciesAA no.subspecies 20
> genusC speciesAAA subspeciesA 15
> genusC speciesAAA subspeciesB 25
>
> I have tried using 'strsplit', but cannot get the desired result.
> Thank you for any help with this.
>
> Mark Miller
> Gainesville, Florida
The following seems to work for your example. However, others
can probably propose a less clumsy version (but at least this
one breaks it down into its elements):
a <- matrix(c('genusA speciesA', 10,
'genusB speciesAA', 20,
'genusC speciesAAA subspeciesA', 15,
'genusC speciesAAA subspeciesB', 25), nrow=4, byrow=TRUE)
a
# [,1] [,2]
# [1,] "genusA speciesA" "10"
# [2,] "genusB speciesAA" "20"
# [3,] "genusC speciesAAA subspeciesA" "15"
# [4,] "genusC speciesAAA subspeciesB" "25"
A <- NULL
for( i in (1:nrow(a))){
Names <- unlist(strsplit(a[i,1],"[ ]+"))
if(length(Names)==2) Names <- c(Names,"no.subspecies")
A <- rbind(A,c(Names,a[i,2]))
}
colnames(A) <- c("Genus","Species","Subspecies","Count")
A <- as.data.frame(A)
A$Count <- as.numeric(A$Count)
A
# Genus Species Subspecies Count
# 1 genusA speciesA no.subspecies 1
# 2 genusB speciesAA no.subspecies 3
# 3 genusC speciesAAA subspeciesA 2
# 4 genusC speciesAAA subspeciesB 4
Hoping this helps!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Nov-09 Time: 21:37:03
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list