[BioC] shortread base quality

Martin Morgan mtmorgan at fhcrc.org
Mon Aug 27 15:44:42 CEST 2012


On 08/27/2012 06:17 AM, David martin wrote:
> Thanks Martin, we are almost there.
> Reads are not the same size (range from 14 to 30 nt) . Any other idea ?
> Maybe build an empty matrix and fill in the scores with NA values when
> reads are short ??

I don't think there is anything built-in

q = quality(rfq)
w = width(q)
m = matrix(NA_integer_, length(q), max(width(q)))
for (i in seq(min(w), max(w)))
     m[w==i, 1:i] = as(q[w==i], "matrix")


>
>
> On 08/27/2012 03:07 PM, Martin Morgan wrote:
>> On 08/27/2012 12:43 AM, David martin wrote:
>>> Not really Martin,
>>> It's the score for each nucleotide so the matrix should be something
>>> like this
>>>
>>>      1 2 3 4 5 6 7 .. 30
>>> [1,]    36 36 36 36 36 .. 36 #score for each nucleotide at each position
>>> of the read.
>>> [2,]    36 36 36 36 36 .. 36
>>> [3,]    36 36 28 36 36 .. 36
>>> [4,]    36 36 36 36 28 .. 36
>>> [5,]    36 28 36 36 36 .. 36
>>>
>>> Any idea ?
>>
>> if the reads are all the same width (implied by the matrix above) then
>> as(quality(rfq), "matrix"); sorry about the misleading info. Martin
>>
>>>
>>> On 08/24/2012 06:30 PM, Martin Morgan wrote:
>>>> On 08/24/2012 07:09 AM, David martin wrote:
>>>>> Hi,
>>>>> I'm trying to get the quality scores for each nucleotide for each read
>>>>> from the fastq file.
>>>>> Here is how it starts. I know to get average scores for reads but not
>>>>> for each individual nucleotide of each read.
>>>>>
>>>>>
>>>>>
>>>>> file <- "file.fastq"
>>>>> fqfile <- paste(basename(file),"",sep="")
>>>>> path <- dirname(file)
>>>>> sp <- SolexaPath(path,dataPath=path,analysisPath=path)
>>>>> fq <- readFastq(sp, fqfile)
>>>>>
>>>>> #Get quality scores
>>>>> score <- alphabetScore(fq)
>>>>>
>>>>> #Gives the sum of the base quality scores for each read
>>>>> aveScore <- score / width(fq)
>>>>
>>>> try alphabetFrequency, e.g.,
>>>>
>>>>    ragged = narrow(quality(rfq), 1, width=c(20, 30)) ## recycle width
>>>>    alf = alphabetFrequency(ragged)
>>>>
>>>>  > dim(alf)
>>>> [1] 256  94
>>>>  > alf[1:5, 1:5] # not very exciting at this end...
>>>>         ! " # $
>>>> [1,] 0 0 0 0 0
>>>> [2,] 0 0 0 0 0
>>>> [3,] 0 0 0 0 0
>>>> [4,] 0 0 0 0 0
>>>> [5,] 0 0 0 0 0
>>>>
>>>>
>>>> Martin
>>>>
>>>>>
>>>>> #How can i get the score for each base for each read ????
>>>>>
>>>>> thanks,
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list