[Bioc-sig-seq] perTile QA element in the FastqQA class

Sirisha Sunkara ssunkara at lbl.gov
Tue Apr 13 20:39:03 CEST 2010


I am sorry, this line should read as:

 > fqhead[fqhead$V2 == "8" & fqhead$V3 == "120",]

my dataframe looks like this:
V2 is the lane number, V3 is the tile number, V4 and V5 being the x and 
y coordinates of the cluster position.
 > head(fqhead)
            V1 V2 V3 V4   V5
1  @ILLUMINA06  8  1  6  849
2  @ILLUMINA06  8  1  6 1169
3  @ILLUMINA06  8  1  6 1163
4  @ILLUMINA06  8  1  6 1512
5  @ILLUMINA06  8  1  6 1251
6  @ILLUMINA06  8  1  6  372
7  @ILLUMINA06  8  1  6 1555
8  @ILLUMINA06  8  1  6 1644
9  @ILLUMINA06  8  1  6 2011
10 @ILLUMINA06  8  1  7 1835

Sirisha


Sirisha Sunkara wrote:
> Hi Martin,
>
> I am using the sequence.txt files generated by the Illumina pipeline 
> (OLB1.6/RTA1.6) as is, which seem to have the tile coordinates.
>
> Just so I can focus on the ReadIDs part for now (and I am sure this is 
> not exactly what you asked for), I parsed out the readIDs from the 
> fastq, and am working with those.
>
> This is what my fastqs look like:
>
> @ILLUMINA06:8:1:6:849#0/1
> GCTCTTTTTGATTCTCAAATCCGGCGTCAACCATA
> +ILLUMINA06:8:1:6:849#0/1
> a`abaa_aaa]_a`_a_[]`]a_`aa_`_aa`aaa
> @ILLUMINA06:8:1:6:1169#0/1
> TAATGCCACTCCTCTCCCGACTGTTAACACTGCTG
> +ILLUMINA06:8:1:6:1169#0/1
> ab`_Z_aXa`bbababbbaabaaaaababaaa`V`
>
> My very basic attempt at this:
>
> > fqhead <- 
> read.table("./Contam_Screening/Run703/sequence_8_1_hdrs.txt", sep=":")
>
> To extract all entries for instance in lane 8, tile 120:
> > fqhead[fqhead$V2 == "3" & fq$V3 == "120",]
>
> I hope I am somewhat closer to what you asked for...
>
> Thanks a lot!
> Sirisha
>
>
> Martin Morgan wrote:
>> On 04/12/2010 02:49 PM, Sirisha Sunkara wrote:
>>  
>>> Hi Martin,
>>>
>>> The qa function that reads in fastq format files, doesn't seem to
>>> populate the perTile QA element with row information...
>>> The row counts are zero for both the readCounts and
>>> medianReadQualityScore list elements of perTile.
>>>
>>> Is this feature still work in progress..? Essentially, I am trying to
>>> get the TileQC plots for lanes where there was no reference genome to
>>> align (no export.txt files)
>>>     
>>
>> Hi Sirisha -- fastq files can't be guaranteed to have tile info so
>> ShortRead doesn't try to guess these, even if some software adopts
>> conventions for embedding the information in the read ids.
>>
>> The tile images are generated by
>>
>>   ShortRead:::.plotTileCounts
>>
>> and
>>
>>   ShortRead:::.plotTileQualityScore
>>
>> both take a regular data.frame. For .plotTileCounts, the columns are
>> 'type' (safe to ignore, I think), 'tile' (integer tile index), 'lane'
>> (integer lane index), and 'count' (number of reads in this particular
>> lane & tile). As an untested work-around, you could create a data frame
>> like this by parsing your read IDs using standard R commands; provide an
>> example of what the read IDs look like and I'll help you. For the
>> .plotTileQualityScore, the columns are 'type', 'tile', 'lane', and
>> 'score', where 'score' is the median 'qualityScore'
>> (alphabetScore(quality(srq)) / width(quality(srq)) for some ShortReadQ
>> object srq obtained by readFastq) over all reads in the tile.
>>
>> Martin
>>
>>  
>>>>  qafq <- qa("./Contam_Screening/Run703/","s_8_1_sequence.txt",
>>>>       
>>> type="fastq")
>>>    
>>>> qafq
>>>>       
>>> class: FastqQA(9)
>>> QA elements (access with qa[["elt"]]):
>>>  readCounts: data.frame(1 3)
>>>  baseCalls: data.frame(1 5)
>>>  readQualityScore: data.frame(512 4)
>>>  baseQuality: data.frame(94 3)
>>>  alignQuality: data.frame(1 3)
>>>  frequentSequences: data.frame(50 4)
>>>  sequenceDistribution: data.frame(1663 4)
>>>  perCycle: list(2)
>>>    baseCall: data.frame(150 4)
>>>    quality: data.frame(1081 5)
>>>  perTile: list(2)
>>>    readCounts: data.frame(0 4)
>>>    medianReadQualityScore: data.frame(0 4)
>>>
>>> Thank You,
>>> Sirisha
>>>
>>>    
>>>> sessionInfo()
>>>>       
>>> R version 2.11.0 Under development (unstable) (2010-03-07 r51225)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   
>>> base   other attached packages:
>>> [1] ShortRead_1.5.21    lattice_0.18-3      Biostrings_2.15.22
>>> [4] GenomicRanges_0.1.0 IRanges_1.5.74      Rmpi_0.5-8       loaded 
>>> via a namespace (and not attached):
>>> [1] Biobase_2.7.5 grid_2.11.0   hwriter_1.2   tools_2.11.0
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>     
>>
>>
>>   
>
>



More information about the Bioc-sig-sequencing mailing list