[Bioc-sig-seq] ShortRead internal: too many 'snap' entries

Mon Apr 5 00:46:45 CEST 2010

On 04/04/2010 02:07 PM, Yanwei Tan wrote:
> Dear Martin,
> 
> I use nFilter to filter out the sequences which contain any "N",
> following is my codes:
> 
>> # read the fastq file
>> fq<-readFastq("/Users/wei/Desktop/Originaldata",pattern="Bic.txt")
>> # filter for N containing reads
>> filt<-nFilter()
>> fq<-fq[filt(fq)]
>> # write the out
>> writeFastq(fq,file="/Users/wei/Desktop/Originaldata/bicfiltered.txt")
> 
> 
> After I got the filtered fastq file:
> 
>>readFastq("/Users/wei/Desktop/Originaldata", "bicfiltered.txt")
> Error in  .local(dirPath, pattern,...) :
>     ShortRead internal: too many 'snap' entries

Execute these commands

  library(ShortRead)
  example(readFastq)

Then please copy and paste the results of the following commands

  f = tempfile()
  writeFastq(rfq, f)
  readFastq(f)

If your results look like mine:

> f = tempfile()
> writeFastq(rfq, f)
> readFastq(f)
class: ShortReadQ
length: 256 reads; width: 36 cycles

then please report the output of

  list.files("/Users/wei/Desktop/Originaldata", "bicfiltered.txt")

In your commands, after fq[filt(fq)], please report the output of

  fq

Please confirm that you do not manipulate the file produced by
writeFastq() before trying to readFastq().

Martin

> 
> My sessioninfo():
> R version 2.10.1 (2009-12-14)
> x86_64-apple-darwin9.8.0
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] ShortRead_1.4.0    lattice_0.17-26    BSgenome_1.14.2   
> Biostrings_2.14.12 IRanges_1.4.11
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1   hwriter_1.1   tools_2.10.1
> 
> Many thanks!
> Wei
> 
> 
> On 4/4/10 10:31 PM, Martin Morgan wrote:
>> On 04/04/2010 11:55 AM, Yanwei Tan wrote:
>>   
>>> Hi Ramzi Temanni,
>>>
>>> I met the same problem with you when running shortread. As Martin
>>> mentioned, there is one new line missing after the last file record. How
>>> did you fix this problem? I do not know how to add a new line after the
>>> last line. My data is fastq file, I just filtered the reads which
>>> contain N by using the nFilter function in shortread package.
>>>      
>> In off-list email you said
>>
>>   
>>> I used ShortRead package to filter the data and then saved as fastq
>>> file. But when I run the qa function again there is error in
>>> .local(dirPath, pattern, ...):>  >    ShortRead internal: too many
>>> 'snap' entries.
>>>      
>> It is hard to follow what you are trying to accomplish. Please paste
>> short code to illustrate. Use data files from ShortRead, so that your
>> code is reproducible by others. Include the output of sessionInfo() so
>> that it is clear which version of software you are using. Perhaps after
>>
>>    example(readFastq)
>>
>> you do
>>
>>   
>>> rfq
>>>      
>> class: ShortReadQ
>> length: 256 reads; width: 36 cycles
>>   
>>> file = tempfile() # a file to save output
>>> noNrfq = rfq[nFilter()(rfq)]
>>> writeFastq(noNrfq, file)
>>> qaresult = qa(dirname(file), basename(file), type="fastq")
>>>      
>> ? But what is the problem? Note also that it is not necessary to write
>> the fastq file to disk,
>>
>>   
>>> qa(list(noNrfq=noNrfq))
>>>      
>> class: ShortReadQQA(9)
>> QA elements (access with qa[["elt"]]):
>>    readCounts: data.frame(1 3)
>>    baseCalls: data.frame(1 5)
>>    readQualityScore: data.frame(512 4)
>>    baseQuality: data.frame(94 3)
>>    alignQuality: data.frame(1 3)
>>    frequentSequences: data.frame(50 4)
>>    sequenceDistribution: data.frame(3 4)
>>    perCycle: list(2)
>>      baseCall: data.frame(141 4)
>>      quality: data.frame(341 5)
>>    perTile: list(2)
>>      readCounts: data.frame(0 4)
>>      medianReadQualityScore: data.frame(0 4)
>>
>> This is my sessionInfo()
>>
>>   
>>> sessionInfo()
>>>      
>> R version 2.10.1 Patched (2010-03-27 r51570)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] ShortRead_1.4.0    lattice_0.18-3     BSgenome_1.14.2
>> Biostrings_2.14.12
>> [5] IRanges_1.4.16
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.6.1 grid_2.10.1   hwriter_1.2   tools_2.10.1
>>
>>   
>>> Many thanks in advance!
>>> Wei
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>      
>>    
> 
> 

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793