[BioC] readPileup() Error

Martin Morgan mtmorgan at fhcrc.org
Fri Nov 11 16:27:55 CET 2011


On 11/10/2011 02:41 AM, Vincent Carey wrote:
> I think I can confirm this using the ex1.pileup in the examples folder of
> the samtools svn distro, revision 998,
> which says it is 0.1.18 -- BUT READ ON
>
>> Z = readPileup("ex1.pileup", variant="SNP")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>    scan() expected 'an integer', got '^~.'
>
> Enter a frame number, or 0 to exit
>
>   1: readPileup("ex1.pileup", variant = "SNP")
>   2: readPileup("ex1.pileup", variant = "SNP")
>   3: callGeneric(conn, ...)
>   4: eval(call, sys.frame(sys.parent()))
>   5: eval(expr, envir, enclos)
>   6: readPileup(conn, variant = "SNP")
>   7: readPileup(conn, variant = "SNP")
>   8: .local(file, ...)
>   9: .readPileup_SNP(file = file, ..., variant = variant)
> 10: .readPileup_table(file, colClasses, ...)
> 11: read.table(conn, colClasses = colClasses, col.names =
> names(colClasses), se
> 12: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
> nmax =
>
>
> I believe the readPileup function was a very early contribution as
> Rsamtools emerged.  Note that its man page references pileup -cv as the
> generator for input, but that samtools command is removed in favor of
> mpileup.  So at best readPileup is retained for legacy applications.
>
> Now, for pileup analysis with Rsamtools, you should use the PileupFile()
> interface direct to BAM and indices, and applyPileups() to extract the base
> frequency or other functionals of pileups.  Do not create the pileups
> separately with Rsamtools, applyPileups() will compute them internally.
>
> On Thu, Nov 10, 2011 at 3:10 AM, Ron Ophir<ron at volcani.agri.gov.il>  wrote:
>
>> Dear BMMs,
>>
>> The session information is listed below.
>> When I read a pileup file (readPileup) I got as an output of samtools
>> 0.1.18, I get the following error
>>
>> Snp = readPileup(pileupFile,variant="SNP")
>>
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>> na.strings,  :
>>   scan() expected 'an integer', got '^<,'
>>
>> ------------------------------------------------------------------------
>> ------
>>
>> The pileup format in the example of readPileup() function is
>>
>> 6       33701   A       G       21      21      12      35
>> ,,,,$,,,,.,,,,.,..,..g.,,gGggggGgggg
>> ??@AA3>=3((>A?<-9<-8?9A>87;%6148+=2
>>
>> The pileup format at samtools web pages is
>>
>> seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+.<<<+;<<<<<<<<<<<=<;<;7<&
>>
>> and in my pileup is
>>
>> Cs1,6RhaT       332     A       9       ,,,..^;,^E,^Z.^Z.
>> DCCFFCDCC
>>
>> It seems to me that the error is obvious, however I don't know whether
>> it is a bug or a feature. If it is a feature how to run readPileup
>> correctly to be able reading latest pileup format?

Hi Ron, Vince --

Not exactly sure where this stands. From Ron's post and as Vince 
indicates I think the problem is that Ron's pileup file does not conform 
to what readPileups requires (produced with samtools pileup -cv ...), 
and furthermore that this file format is not produced by current samtools.

Vince mentions applyPileup; another route from mpileup is the -g option 
to generate BCF files, and then indexBcf / scanBcf. For instance in the 
samtools/examples folder

   > library(Rsamtools)
   > indexBcf("ex1.bcf")
   > res = scanBcf("ex1.bcf")

The output format is 'scan'-like so not 100% user friendly; see 
example(scanBcf).

Vince mentions ex1.pileup from samtools as a reproducible example. This 
is generated by running make in samtools/examples, and with that svn 
revision produces a zero-length file (since pileup is no longer 
supported). Instructions to create the file are still in 
samtools/examples/Makefile, but use the options -cf so would not be 
expected to produce a readPileup-compatible file.

There is example(readPileup) which points to 'pileup.txt' and which 
parses without error.

   library(Rsamtools)
   example(readPileup)
   readPileup(fl, variant="SNP")

Martin

>>
>> Thanks,
>> Ron
>>
>> Abbreviation
>>
>> BMM bioconductor mailing-list member
>>
>>
>> sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>>   [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>>   [7] LC_PAPER=C                LC_NAME=C
>>   [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>>
>> other attached packages:
>> [1] Rsamtools_1.6.1     Biostrings_2.22.0   GenomicRanges_1.6.2
>> [4] IRanges_1.12.1
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-4.1     BSgenome_1.22.0    RCurl_1.7-0
>> rtracklayer_1.14.1
>> [5] tools_2.14.0       XML_3.4-3          zlibbioc_1.0.0
>>
>>
>> This mail was sent via Mail-SeCure System.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list