[BioC] OTU delimitation from simple fasta (sanger) sequences
Martin Unterseher
martin.unterseher at uni-greifswald.de
Thu Aug 16 14:30:12 CEST 2012
Dear all,
I struggled with readOTUset{OTUbase} for some time, searched the web and r-archives including this one without success.
Whereas OTUbase is obviously designed for NGS datasets after passing specific 454 pipelines, I am searching a convenient method to delimit OTUs from a simple (sanger) sequence fasta file such as this one (fungal ITS sequences), with the possibility to specify e.g. sequence similarity of 97% over at least 90% length.
The fasta header >VASmic02 says "sequence 02 from host plant VASmic. This example file consists of 10 sequences from 3 host plants VASmic, TILusn and HEVbra.
>VASmic02
ACCGGGATGTTCATAACCCTTTGTTGTCCGACTCTGTTGCCTCCGGGGCGACCCTGCCTTCGGGCGGGGGCTCCGGGTGGACACTTCAAACTCTTGCGTAACTTTGCAGTCTGAGTAAACTTAATTAATAAATTACACCACTCAAGCCTCGCTTGGTATTGGGCAACGCGGTCCGCCGCGTGCCTCAAATCGACCGGCTGGGTCTTCTGTCCCCTAAGCGTTGTGGAAACTATTCGCTAAAGGGTGTTCGGGAGGCTACGCCGTAAAACAACCCCATTTCTAAGG
>VASmic05
CCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGTGGGTTCGCCCGCCGATCGGACAACATTCAAACCCTTTGCAGTTGCAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTCTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAGTACATTTTTAACTC
>VASmic06_1
TACCATCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>TILusn11
TGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>TILusn12
GTATTATTACTTTGTTGCTTTGGCGAGCTGCCTTCGGGCCTTGTATGCTCGCCAGAGAATACCAAAACTCTTTTTATTAATGTCGTCTGAGTACTATATAATAGTTACAACCCTCAAGCTTAGCTTGGTATTGAGTCTATGTCAGTAATGGCAGGCTCTAAAATCAGTGGCGGCGCCGCTGGGTCCTGAACGTAGTAATATCTCTCGTTACAGGTTCTCGGTGTGCTTCTGCCAAAACCCAAATTTTTCTATGG
>VASmic14
ATCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>VASmic16
CTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGGTTGGACAACATTCAAACCCTTTGCAGTTGCAATCAGCGTCTGAAAAAACTTAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTTGTCTCGCCTCCGCGCGCAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAGTACATTTTTACACTC
>HEVbra17
ACCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAACATTCAAACCCTTTGCAGTTGCAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCTCCTCTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAGTACATTTTTACACTC
>HEVbra18
CTACCATCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>VASmic21
ACCTTACCAAACTGTTGCCTCGGCGGGGTCACGCCCCGGGTGCGTCGCAGCCCCGGAACCAGGCGCCCGCCGGAGGGACCAACCAAACTCTTTCTGTAATCCCCTCGCGGACGTTATTTTTACAGCTCTGAGCAAAAATTCAAAATGAATCACAACCCTCGAACCCCTCCGGGGGTCCGGCGTTGGGGATCGGGAACCCCTAAGACGGGATCCCGGCCCCGAAATACAGTGGCGGTCTCGCCGCAGCCTCTCATGCGCAGTAGTTTGCACAACTCGCACCGGGAGCGCGGCGCGTCCACGTCCGTAAAACACCCAACTTCTGAAATG
There are surely several reasonable possibilities for an output, among others maybe this one (e.g. as data frame), which would allow subsequent diversity analyses with vegan, e.g. specaccum, metaMDS, etc.
VASmic TILusn HEVbra
OTU.01 4 1 1
OTU.02 2 1 0
OTU.03 0 0 1
Hoping that someone can help me with this.
Best
Martin
__________
PD Dr. Martin Unterseher
Universität Greifswald
Institut für Botanik und Landschaftsökologie
Grimmer Str. 88
17487 Greifswald
Tel. 03834 / 864184
Fax. 03834 / 864114
http://www.botanik.uni-greifswald.de/100.html
More information about the Bioconductor
mailing list