[BioC] How to prpare the input data to writeFASTA ? Examples of CharacterToFASTArecords ...
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Jul 17 04:32:04 CEST 2009
Hi,
On Jul 16, 2009, at 9:16 PM, <mauede at alice.it> <mauede at alice.it> wrote:
> I realize function write FASTA expects a list with two items,
> respectively, description and sequence.
> However, just passing a list won't work (please, see code at the
> bottom of this message)
Sorry ... perhaps we're not understanding your problem. Doesn't the
reply I sent earlier today work? If I'm not mistaken, didn't you say
that you two variables that have the description and sequence info for
your data, like so?
library(Biostrings)
desc <- paste("gene", 1:10, " some other stuff", sep="")
seqs <- replicate(10,paste(sample(c('A','C','G', 'T'), 50,
replace=TRUE), collapse=""))
Because this works for me:
fasta.list <- lapply(1:length(desc), function(i) list(desc=desc[i],
seq=seqs[i]))
writeFASTA(fasta.list, 'test.fa')
> I saw there is the helper function CharacterToFASTArecords(x) that
> presumably generates the right input data format.
> It would b very useful to get some example of
> CharacterToFASTArecords(x) usage.
> The on-line documentation reads:
> "For CharacterToFASTArecords, the (possibly named) character vector
> to be converted to a list of FASTA records as one returned by
> readFASTA"
> Since I have description and sequnce in separate variables ... I do
> not know how to use it.
That function expects the description to be in the "names" attribute
of your character vector. For example, taking the same variables from
above:
names(seqs) <- paste("gene", 1:10, sep='')
fasta.list <- CharacterToFASTArecords(seqs)
> zz <- file (filname,"w")
> write(miRNA.rec, zz, append = FALSE)
> write(miRNA.seq,zz, append = TRUE)
I don't get why you're writing something here manually if this is
supposed to be your fasta file, then calling writeFASTA on it ...
> #
> geneDesc <- paste (">",gene.id, "|",
> gene.map[i,"ensembl_transcript_id"], sep="")
> geneSeq <- gene.seq[i,"3utr"]
> gene.string <- list(desc=geneDesc, seq=geneSeq)
> writeFASTA (gene.string, zz)
For starters, you shouldn't be pasting the ">" in the description
attribute, as writeFASTA will take care of it.
Assuming your seqs and descs vars are as I wrote above, just use the
example as I gave it ... it'll work.
-steve
btw - I'm not sure cross posting to r-help is necessary, as this is
BioC specific, so I removed it from the reply.
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos
More information about the Bioconductor
mailing list