[BioC] writing a fasta file in blocks
Martin Morgan
mtmorgan at fhcrc.org
Wed Jun 9 03:50:58 CEST 2010
On 06/09/2010 02:31 AM, Kasper Daniel Hansen wrote:
> Doing what Fahim suggests internally in writeFASTA has been on my todo
> list for a while, and it will significantly speed up the writing of
> fasta files with many small records. Guess I should do it now, and
> cross it off my list.
>
> But Fahim: I am not sure it is possible to do what you want to do with
> the current function (at least if you are using Biostrings), but I
> could be wrong. If you want to investigate further, note that the
> file can be a connection (?connection).
>
> Kasper
>
> On Tue, Jun 8, 2010 at 4:26 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Mon, Jun 7, 2010 at 11:12 PM, Fahim Md <fahim.md at gmail.com> wrote:
>>
>>> I have a data File, the format of which is given below. It has two fields,
>>> namely sequence and probeset name.
>>> sequence Probe Set Name
>>> GCTACTTTACTCCAGAATTTTGTTA 1367452_at.1
>>> TTAGAAAGCCGCAATTTGGTCCCGC 1367452_at.2
>>> GCCACATCCTGACTACTGCAGTATA 1367452_at.3
>>> ............
>>> AAAAAAAAGGGGGGGTCCCCCCCC 1234567_at.1
>>>
>>>
>>> Now, I want to convert that into FASTA format as follows
>>>
>>>> 1367452_at.1
>>> GCTACTTTACTCCAGAATTTTGTTA
>>>> 1367452_at.2
>>> TTAGAAAGCCGCAATTTGGTCCCGC
>>>> 1367452_at.3
>>> GCCACATCCTGACTACTGCAGTATA
>>> .......
>>> ....
>>>> 1234567_at.1
>>> AAAAAAAAAAAAACCCCCCCCCCCC
>>>
>>>
>>> I am getting the required output by using writeFASTA(..) function but it is
>>> too slow because I am using loop and in every loop it access the file to
>>> write into.
>>>
>>> Is there anyway through which I can write this fasta information into some
>>> variable and once I am done I write back that variable into the required
>>> file.
For short sequences where line wrapping is not important, you might
input the data with
df = read.table(...)
and the like, create a template for the output
fasta = character(nrow(df))
then fill it in (no loop required)
fasta[c(TRUE, FALSE)] = paste(">", df[["Probe.Set.Name"]])
fasta[c(FALSE, TRUE)] = df[["sequence"]]
and save it
write(fasta, "/some/file.fasta")
Martin
>>>
>>>
>> Hi,Fahim.
>>
>> Probably most appropriate for here and not bioc-devel. Perhaps a
>> reproducible code example and some sessionInfo() would be helpful.
>>
>> Sean
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list