[Bioc-sig-seq] readAligned for Illumina's export file

Fri Nov 5 18:06:08 CET 2010

On 11/05/2010 09:54 AM, Kunbin Qu wrote:
> Martin, thanks for the help. You are correct, readAligned can read
> those reads in when the filter is not there. The chromosome filter I
> had in my command did the screening which eliminated the reads mapped
> across the junctions, including those were on the desired chromosome
> (in my original bigger file), since the "chromosome" field are all
> "splice_sites-auto.fa". Does ShortRead have a parser to extract the
> splice junction coordinates from the 2nd entry in my previous email,
> or I need to write myself, as from the pure readAligned (ie, without
> "filter") it does not seem to be able to interpret the coordinates
> correctly. Thanks again.

Hi Kunbin -- if you mean the info. embedded in the chromosome() entry
for the splice junction read, you'll have to write your own parser.
Other fields in the SolexaExport file are captured in
pData(alignData(aln)). Martin

> 
> -Kunbin
> 
> 
> 
> -----Original Message----- From: Martin Morgan
> [mailto:mtmorgan at fhcrc.org] Sent: Friday, November 05, 2010 9:28 AM 
> To: Kunbin Qu Cc: 'Bioc-sig-sequencing at r-project.org' Subject: Re:
> [Bioc-sig-seq] readAligned for Illumina's export file
> 
> On 11/05/2010 08:51 AM, Kunbin Qu wrote:
>> Dear all,
>> 
>> can readAligned() or other function read in the reads mapped
>> across the junctions in the "export" file (eg, s_1_export.txt)
>> from Illumina's pipeline? The following is the example of a
>> regular mapping entry and a read mapped across two exons. I had a
>> test file named s1Test, and when I used the following command, it
>> can only read in the first read. Thanks.
> 
> It's tricky to know what your file looks like, but this should be
> parsed by readAligned.
> 
>> x = readAligned("/tmp/kunbin_export.txt", type="SolexaExport") x
> class: AlignedRead length: 2 reads; width: 51 cycles chromosome:
> chrX.fa splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824 
> position: 108773654 20 strand: + - alignQuality: NumericQuality 
> alignData varLabels: run lane ... filtering contig
>> sread(x)
> A DNAStringSet instance of length 2 width seq [1]    51
> NTTTTAAAAACAGAATTTCTGCTCTATAATAACACAGCTAAAGGGAAATAA [2]    51
> NGAACTTTAAGAGTGGTGTGGATGCAGACTCTTCTTATTTTAAAATCTTTA
>> quality(x)
> class: SFastqQuality quality: A BStringSet instance of length 2 width
> seq [1]    51 BKOJHRQPPO_QQ_____b_b___b_bb_bb__bb__b_b___bbb_b__Q [2]
> 51 BKIKKUUTTU_____[[[[[[[[[[_b_____b______QQQ__b___b__
> 
> maybe your 'cfilt' filters out 'chromosomes' (which should probably
> have been something else, rseq?)
> 
>> chromosome(x)
> [1] chrX.fa [2]
> splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824 2 Levels:
> chrX.fa ...
> 
> More hints on what 'it can only read the first read' means might
> help.
> 
> Martin
> 
> 
>> 
>> -Kunbin
>> 
>> SEQUENCER01     10      1       1       5110    943     0       1 
>> NTTTTAAAAACAGAATTTCTGCTCTATAATAACACAGCTAAAGGGAAATAA 
>> BKOJHRQPPO_QQ_____b_b___b_bb_bb__bb__b_b___bbb_b__Q     chrX.fa 
>> 108773654    F       T50     199 Y
>> 
>> SEQUENCER01     10      1       1       2815    941     0       1 
>> NGAACTTTAAGAGTGGTGTGGATGCAGACTCTTCTTATTTTAAAATCTTTA 
>> BKIKKUUTTU_____[[[[[[[[[[_b_____b______QQQ__b___b__ 
>> splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824   20 R
>> A50     200                                 Y
>> 
>> 
>> 
>> 
>>> s1t<-readAligned("./", pattern="s1Test", type="SolexaExport", 
>>> filter=cfil) sessionInfo()
>> R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu
>> 
>> locale: [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C [3] 
>> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 [5]
>> LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8
>> LC_NAME=C [9] LC_ADDRESS=C               LC_TELEPHONE=C [11] 
>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages: [1] stats     graphics  grDevices utils 
>> datasets  methods   base
>> 
>> other attached packages: [1] ShortRead_1.6.2     Rsamtools_1.0.1 
>> lattice_0.19-11 [4] Biostrings_2.16.7   GenomicRanges_1.0.1 
>> IRanges_1.6.8
>> 
>> loaded via a namespace (and not attached): [1] Biobase_2.8.0 
>> grid_2.11.0   hwriter_1.2   tools_2.11.0
>>> 
>> 
>> 
>> 
>> ______________________________________________________________________
>>
>>
>
>> 
The contents of this electronic message, including any attachments, are
> intended only for the use of the individual or entity to which they
> are addressed and may contain confidential information. If you are
> not the intended recipient, you are hereby notified that any use,
> dissemination, distribution, or copying of this message or any
> attachment is strictly prohibited. If you have received this
> transmission in error, please send an e-mail to
> postmaster at genomichealth.com and delete this message, along with any
> attachments, from your computer.
>> [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioc-sig-sequencing mailing list Bioc-sig-sequencing at r-project.org
>>  https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> 

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793