[BioC] finding end of file in FASTA file
Martin Morgan
mtmorgan at fhcrc.org
Thu Sep 13 14:58:35 CEST 2012
On 09/13/2012 01:42 AM, Jack [guest] wrote:
>
> library(ShortRead)
> fastadata <- readFasta("fastafolder", "fa$")
> file <- tempfile()
> writeFasta(fastadata, file)
> var1 <- readLines(file)
> while(countlength(tmp <- readLines(file, n = -1)) > 0) {
> #do something
> }
>
> I want the while loop to run till the end of file is reached, but the while statement dosent work. Thanks for help.
Hi Jack -- if the goal is to read the fasta file in chunks, use a
'connection' that can remember the current location. After running the
following to get a reproducible example fasta file
library(ShortRead)
example(readFasta)
fl = dir(analysisPath(sp), "s_1_sequence.txt", full=TRUE)
we can create a connection and open it, and the do our loop reading 500
lines at a time
con <- file(fl); open(con)
while(length(res <- readLines(con, n=500)))
cat(length(res), "\n")
close(con)
which prints out
500
500
24
Unfortunately, readFasta doesn't work on connections (that would be a
worthwhile feature request). There is also FaFile in Rsamtools, try
example(FaFile)
FaFile is most useful when the fasta file would benefit from being
indexed, e.g., hundreds of contigs, but might also be useful for your
purposes.
Martin
> Regards
> Jack
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5 Rsamtools_1.8.6 lattice_0.20-10 Biostrings_2.24.1 GenomicRanges_1.8.13
> [8] IRanges_1.14.4 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1 hwriter_1.3 stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list