[BioC] readPileup() Error

Ron Ophir ron at volcani.agri.gov.il
Sat Nov 12 21:55:43 CET 2011


Thanks Martin and Vincent for your responses,

To summarize readPileup() function related to an old format of samtools
which is obsolete.
There are two alternative pathways:
1. is self parsing Bam file using
applyPileups(bamFile,FUN=parsingFun(),param) function. This way is good
for more than one purposes of bam alignment. For example having the
coverage of each site on the reference.
2. using the scanBcf() function that let me fetching variation (snp and
indels), which are imputed by bcftools.

This gave an ideas what would be my next steps.

Ron
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Friday, November 11, 2011 5:28 PM
To: Vincent Carey
Cc: Ron Ophir; bioconductor at r-project.org
Subject: Re: [BioC] readPileup() Error

On 11/10/2011 02:41 AM, Vincent Carey wrote:
> I think I can confirm this using the ex1.pileup in the examples folder
of
> the samtools svn distro, revision 998,
> which says it is 0.1.18 -- BUT READ ON
>
>> Z = readPileup("ex1.pileup", variant="SNP")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings,
> :
>    scan() expected 'an integer', got '^~.'
>
> Enter a frame number, or 0 to exit
>
>   1: readPileup("ex1.pileup", variant = "SNP")
>   2: readPileup("ex1.pileup", variant = "SNP")
>   3: callGeneric(conn, ...)
>   4: eval(call, sys.frame(sys.parent()))
>   5: eval(expr, envir, enclos)
>   6: readPileup(conn, variant = "SNP")
>   7: readPileup(conn, variant = "SNP")
>   8: .local(file, ...)
>   9: .readPileup_SNP(file = file, ..., variant = variant)
> 10: .readPileup_table(file, colClasses, ...)
> 11: read.table(conn, colClasses = colClasses, col.names =
> names(colClasses), se
> 12: scan(file = file, what = what, sep = sep, quote = quote, dec =
dec,
> nmax =
>
>
> I believe the readPileup function was a very early contribution as
> Rsamtools emerged.  Note that its man page references pileup -cv as
the
> generator for input, but that samtools command is removed in favor of
> mpileup.  So at best readPileup is retained for legacy applications.
>
> Now, for pileup analysis with Rsamtools, you should use the
PileupFile()
> interface direct to BAM and indices, and applyPileups() to extract the
base
> frequency or other functionals of pileups.  Do not create the pileups
> separately with Rsamtools, applyPileups() will compute them
internally.
>
> On Thu, Nov 10, 2011 at 3:10 AM, Ron Ophir<ron at volcani.agri.gov.il>
wrote:
>
>> Dear BMMs,
>>
>> The session information is listed below.
>> When I read a pileup file (readPileup) I got as an output of samtools
>> 0.1.18, I get the following error
>>
>> Snp = readPileup(pileupFile,variant="SNP")
>>
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>> na.strings,  :
>>   scan() expected 'an integer', got '^<,'
>>
>>
------------------------------------------------------------------------
>> ------
>>
>> The pileup format in the example of readPileup() function is
>>
>> 6       33701   A       G       21      21      12      35
>> ,,,,$,,,,.,,,,.,..,..g.,,gGggggGgggg
>> ??@AA3>=3((>A?<-9<-8?9A>87;%6148+=2
>>
>> The pileup format at samtools web pages is
>>
>> seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+.<<<+;<<<<<<<<<<<=<;<;7<&
>>
>> and in my pileup is
>>
>> Cs1,6RhaT       332     A       9       ,,,..^;,^E,^Z.^Z.
>> DCCFFCDCC
>>
>> It seems to me that the error is obvious, however I don't know
whether
>> it is a bug or a feature. If it is a feature how to run readPileup
>> correctly to be able reading latest pileup format?

Hi Ron, Vince --

Not exactly sure where this stands. From Ron's post and as Vince 
indicates I think the problem is that Ron's pileup file does not conform

to what readPileups requires (produced with samtools pileup -cv ...), 
and furthermore that this file format is not produced by current
samtools.

Vince mentions applyPileup; another route from mpileup is the -g option 
to generate BCF files, and then indexBcf / scanBcf. For instance in the 
samtools/examples folder

   > library(Rsamtools)
   > indexBcf("ex1.bcf")
   > res = scanBcf("ex1.bcf")

The output format is 'scan'-like so not 100% user friendly; see 
example(scanBcf).

Vince mentions ex1.pileup from samtools as a reproducible example. This 
is generated by running make in samtools/examples, and with that svn 
revision produces a zero-length file (since pileup is no longer 
supported). Instructions to create the file are still in 
samtools/examples/Makefile, but use the options -cf so would not be 
expected to produce a readPileup-compatible file.

There is example(readPileup) which points to 'pileup.txt' and which 
parses without error.

   library(Rsamtools)
   example(readPileup)
   readPileup(fl, variant="SNP")

Martin

>>
>> Thanks,
>> Ron
>>
>> Abbreviation
>>
>> BMM bioconductor mailing-list member
>>
>>
>> sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>>   [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>>   [7] LC_PAPER=C                LC_NAME=C
>>   [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>>
>> other attached packages:
>> [1] Rsamtools_1.6.1     Biostrings_2.22.0   GenomicRanges_1.6.2
>> [4] IRanges_1.12.1
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-4.1     BSgenome_1.22.0    RCurl_1.7-0
>> rtracklayer_1.14.1
>> [5] tools_2.14.0       XML_3.4-3          zlibbioc_1.0.0
>>
>>
>> This mail was sent via Mail-SeCure System.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

This mail was received via Mail-SeCure System.



-----
No virus found in this message.
Checked by AVG - www.avg.com

11/11/11

This mail was sent via Mail-SeCure System.



More information about the Bioconductor mailing list