[Bioc-sig-seq] skip the comment lines with read.BStringSet

Hervé Pagès hpages at fhcrc.org
Tue Feb 9 21:43:50 CET 2010


Hi Burak,

Your file is not FASTA. Comments in a FASTA file are lines that start
with a semi-colon:
   http://en.wikipedia.org/wiki/FASTA_format
You need to fix your file first by replacing # by ; and then you should
be able to read it with read.BStringSet().

Cheers,
H.


burak kutlu wrote:
> Hi
> I am trying to fasta files into my session.
> Is there a way to ignore the comment lines ( that start with #) when using read.BStringSet?
> Here's the error I get when there are such lines at the beginning of the fasta file.
> 
>> read.BStringSet(file= 'burak.test', format = "fasta")
> Error in .read.fasta.in.XStringSet(filepath, set.names, elementType, lkup) : 
>   reading FASTA file     : ">" expected at beginning of line 1
> 
> And this is how the file looks like
> # Cwd: /home/pipeline
> # Title: solid0131_20090316_1_H683_
>> 2_21_1117_F3
> T00212313200111233020001110130100100
>> 2_21_1543_F3
> T01022123123230121000121013333113111
> .....
> 
> Many thanks
> -burak
> 
> 
> 
>> sessionInfo()
> R version 2.10.1 (2009-12-14) 
> x86_64-unknown-linux-gnu 
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base     
> 
> other attached packages:
> [1] ShortRead_1.4.0   lattice_0.17-26   BSgenome_1.14.2   Biostrings_2.14.8
> [5] IRanges_1.4.9    
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1   hwriter_1.1   tools_2.10.1
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list