[BioC] QuasR special case alignment

Wed May 8 15:41:34 CEST 2013

Dear Ugo,

On 08.05.2013 10:33, Ugo Borello wrote:
> Dear Michael,
> Regarding the auxiliary alignments, if I understand correctly,
> having two sequences and because I want to count the alignments separately
> for each of them, I have to format the  auxiliaries.txt file this way:
> FileName    AuxName
> seq1.fa          egfp
> seq2.fa          cre
> So in this way I will have to copy the two sequences in two different files
> ( seq1.fa  and seq2.fa), correct?
That's one possibility. An alternative would be to keep them in the same
fasta library (e.g. seq.fa), and do the counting by using the query
sequence name to specify what to count.

In that case, the auxiliary file would look like this:
FileName  AuxName
seq.fa    myAuxSeqs

and the counting could e.g. look like this (assuming that "egfp" is one
of the sequence identifier in seq.fa):
qCount(proj, GRanges("egfp", IRanges(...)))

> Then, I have two more brief questions:
> 1) is there a special method to save the qProject object?
It is a plain S4 R object that could be saved as other R objects (e.g.
using saveRDS). However, as mentioned earlier on the list it is not
recommended to save the object and reload them, as you could end up with
unconsistent states. For example, one of the bam or fastq files could
have been modified in between saving and reloading.

It is recommended to always re-create the qProject object with qAlign -
it will check for everything to be in order and will only take a second
if the alignment files are already existing.

> 2) and regarding the matrix obtained using qCount, I wanted to add gene ids.
> So I ran:
>> query<- exons(TxDb.Mmusculus.UCSC.mm10.knownGene, columns='gene_id')
>> geneLevels <- qCount(proj1, query, reportLevel = "gene", clObj=cl)
> No gene ids as expected.
> If I run this line before calling qCount
> names(query)<-mcols(query)$gene_id
> I get this error message:
> Error in as.vector(x, mode = "character") : coercing an Atomic List object
> to an atomic vector is supported only for objects with top-level elements of
> length <= 1 
> And when I change it to
> names(query)<-as.vector(mcols(query)$gene_id)
>> geneLevels <- qCount(proj2, query, reportLevel = "gene", clObj=cl)
> Error in reduce(split(query, names(query))[unique(names(query))]) :
>   error in evaluating the argument 'x' in selecting a method for function
> 'reduce': Error in .nextMethod(x = x, i = i) : subscript contains NAs
>  
> qCount complains.
> 
> What is then the correct method to add gene ids to the qCount matrix?
> And what about adding gene names also?

You are mixing up two possible approaches to get gene levels. Either you
use:
* a GRanges query with gene identifiers as names
query <- exons(TxDb.Mmusculus.UCSC.mm10.knownGene, columns='gene_id')
names(query) <- as.character(query$gene_id)
geneLevels <- qCount(proj1, query)

or you use
* a TranscriptDb query and set reportLevel:
geneLevels <- qCount(proj1, TxDb.Mmusculus.UCSC.mm10.knownGene,
reportLevel="gene")

Either one of these should give you the same gene levels.

Best wishes,
Michael