[Bioc-sig-seq] Biostrings: problem to access indel-details form pairwiseAlignment()

Patrick Aboyoun paboyoun at fhcrc.org
Wed Jul 22 20:38:15 CEST 2009


Wolfgang,
The IRanges infrastructure has evolved greatly from BioC 2.4/R-2.9 to 
BioC 2.5/R-2.10 and I highly recommend you use accessor functions, 
rather than direct slot access to obtain the information you are looking 
for. For example, the IRangesList class definition has changed greatly 
and the @elements slot is no longer present.

Question: how to transform an IRangesList object to a list of IRanges object
Answer:
Short answer, use as.list as in as.list(indel(subject(align))). Long 
answer, this is typically not necessary and not desired given that 
IRangesList has many methods defined for them. See man pages for 
IRangesList, RangesList, and Sequence (if you are on BioC 2.5/R-2.10). 
If you can tell me what operations you would like to perform, I can 
point you in the right direction.

Question: how to the the start, width, and end locations of a pairwise 
alignment.
Answer: use the start, width and end methods.
 > start(subject(align))
[1] 1 1 4
 > width(subject(align))
[1] 24 24 19
 > end(subject(align))
[1] 24 24 22


Patrick


Wolfgang Raffelsberger wrote:
> Patrick,
>
> thank you very much for your quick and helpful answer !
>
> Yes, using  :
> > align <- pairwiseAlignment(samp1,ref1)
> > indel(subject(align))
>
> I'm about to get what I'm looking for.  Now my question is, which 
> commands are (will be) availabel for mining an IRangesList-object.
> Most of all I'm interested in what would correspond to getting :
>
> > indel(subject(align))@elements
> > subject(align)@range at start
> > subject(align)@range at witdth   # in fact, so far I can do without 
> this one
>
> (unless you think the @elements, and  @range won't change in the 
> future ...)
> With these elements I manage now to extract the very nucleotides that 
> were inserted/deleted.
>
> Wolfgang
>
>
> Patrick Aboyoun a écrit :
>> Wolfgang,
>> Below is code that retrieves the indel locations you are looking for. 
>> I like your attempts at using indel, insertion, and deletion for 
>> PairwiseAlignment objects and I'll add the methods for 
>> PairwiseAlignment objects to BioC 2.5 (devel) shortly using the 
>> conventions that I specify below.
>>
>> > suppressMessages(library(Biostrings))
>> > ref1 <- DNAString("GGGATACTTCACCAGCTCCCTGGC") # my pattern
>> > samp1 <- 
>> DNAStringSet(c("GGGATACTACACCAGCTCCCTGGC","GGGATACTTACACCAGCTCCCTGGC","ATACTTCACCAGCTCCCTG")) 
>>
>> > # 1st has a mutation, 2nd has an insertion, the 3rd is simply 
>> shorter ...
>> >
>> > align <- pairwiseAlignment(samp1,ref1)
>> >
>> > nindel(align)
>> An object of class “InDel”
>> Slot "insertion":
>> Length WidthSum
>> [1,] 0 0
>> [2,] 1 1
>> [3,] 0 0
>>
>> Slot "deletion":
>> Length WidthSum
>> [1,] 0 0
>> [2,] 0 0
>> [3,] 0 0
>>
>> > deletions <- indel(pattern(align))
>> > deletions
>> CompressedIRangesList: 3 elements
>> > insertions <- indel(subject(align))
>> > insertions
>> CompressedIRangesList: 3 elements
>> > insertions[[2]]
>> IRanges instance:
>> start end width
>> [1] 10 10 1
>> > sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
>> i386-apple-darwin9.7.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] Biostrings_2.13.26 IRanges_1.3.41
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.5.4
>>
>>
>> Wolfgang Raffelsberger wrote:
>>> Dear list,
>>>
>>> previously I've been extracting indel-information from sequences 
>>> aligned by the Biostrings function pairwiseAlignment(), which is 
>>> probably not the best way since the class 
>>> 'PairwiseAlignedFixedSubject' has evoled & changed and my old code 
>>> won't work any more. Now trying to use the library-provided 
>>> functions to access the information/details about indels (ie their 
>>> localization on the pattern and possibly the indel sequence ). 
>>> However, I can't find a function to extract this information, that 
>>> is (to the best of my knowledge) part of the aligned object.
>>>
>>> ## here an example :
>>> library(Biostrings)
>>> ref1 <- DNAString("GGGATACTTCACCAGCTCCCTGGC") # my pattern
>>> samp1 <- 
>>> DNAStringSet(c("GGGATACTACACCAGCTCCCTGGC","GGGATACTTACACCAGCTCCCTGGC","ATACTTCACCAGCTCCCTG")) 
>>>
>>> # 1st has a mutation, 2nd has an insertion, the 3rd is simply 
>>> shorter ...
>>>
>>> align <- pairwiseAlignment(samp1,ref1)
>>>
>>> nindel(align) # insertion was found properly but I can't see at 
>>> which nt position the indel was found (neither if it's an insertion 
>>> or deletion)
>>> indel(align) # Error in function (classes, fdef, mtable) unable to 
>>> find an inherited method for function...
>>> insertion(align) # Error in function (classes, fdef, mtable) unable 
>>> to find an inherited method for function ...
>>> deletion(align) # neither ...
>>> ?AlignedXStringSet # says under 'Accessor methods' that indel() 
>>> exists ..
>>>
>>> ## ideally I'd be looking for something like
>>> mismatchTable(align) # but addressing indels ...
>>>
>>>
>>> ## for completeness :
>>> > sessionInfo()
>>> R version 2.9.1 (2009-06-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 
>>>
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>> other attached packages:
>>> [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3 
>>> Biostrings_2.12.7 IRanges_1.2.3
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
>>>
>>> Thank's in advance,
>>> Wolfgang Raffelsberger
>>>
>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>> Wolfgang Raffelsberger, PhD
>>> Laboratoire de BioInformatique et Génomique Intégratives
>>> CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, 
>>> France
>>> Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
>>> wolfgang.raffelsberger (at) igbmc.fr
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>>
>
>



More information about the Bioc-sig-sequencing mailing list