[R] working with taxonomic trees: sampling
Olga Lyashevska
olga at herenstraat.nl
Mon Feb 1 13:32:32 CET 2010
Dear all,
I am working with taxonomic data, represented as a list of classes,
orders, families, genera and finally species.
> class(mydata)
[1] "data.frame"
> mode(mydata)
[1] "list"
> names(mydata)
[1] "tclass" "torder" "tfamily" "tgenus" "tspecies"
> length(mydata$tclass)
[1] 161590
The first 10 rows look like the following:
> mydata[1:10,]
tclass torder tfamily tgenus
1 Chlorophyta Chlorophyceae Dunaliellaceae Collodictyon
2 Chlorophyta Chlorophyceae Dunaliellaceae Collodictyon
3 Chlorophyta Chlorophyceae Dunaliellaceae Collodictyon
4 Chlorophyta Chlorophyceae Dunaliellaceae Dunaliella
5 Chlorophyta Chlorophyceae Dunaliellaceae Dunaliella
6 Chlorophyta Chlorophyceae Dunaliellaceae Dunaliella
7 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
8 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
9 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
10 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas
tspecies
1 Collodictyontriciliatum
2 Collodictyonciliatum
3 Collodictyonsemiciliatum
4 Dunaliellasalina
5 Dunaliellabardawil
6 Dunaliellatertiolecta
7 Brachiomonassubmarina
8 Brachiomonassimplex
9 Brachiomonasellipsoidalis
10 Brachiomonaswestiana
In total I have 115 (unique) classes, containing 733 orders,
containing 16 185 families, etc
What I am trying to do is to obtain a subtree represented by let's say
n1 random classes, containing n2 random orders (but restricted to
those that belong to the classes chosen earlier), containing n3 random
families etc and all the way down to species, where the number of
species will be n5.
So the elements I chose at each subsequent level will be defined by
elements that are already chosen at the level above. If I randomly
chose lets say 3 classes A,B and C I want to restrict our randomly
chosen orders (lets say a1,a2,a3, b1,b2) to only those classes that
are already chosen. Similarly I also need to restrict list of families
to those orders that are chosen and that are known to belong to
classes A,B,C.
So I want to obtain a subtree spanning across all taxonomic levels,
with randomly defined number of elements at each taxonomic level but
in a such way that at the end I will not end up with orphaned nodes
i.e. species without classes.
I have been trying to use 'sample' like following:
tcla<-sample(tclass,10,replace=T) #I pick 10 random elements, but I
want it to be a random number;
torder1<-torder[tclass==tcla] # I match list of orders with those that
belong to classes defined earlier;
tord<-sample(torder1, 10,replace=T) # pick 10 orders from classes that
are already chosen;
etc all the way down to species level.
The problem with this approach is that I may obtain branches without
any leaves. How to get rid of those branches?
And after all I want to repeat this procedure lets say 1000 times,
each time obtaining different number of elements at each taxonomic
level.
Sorry for this long-winded post, I hope it is clear what I am trying
to do.
I would appreciate any tips!
Thanks,
Olga
More information about the R-help
mailing list