[Bioc-sig-seq] Biostrings: problem to access indel-details form pairwiseAlignment()

Wolfgang Raffelsberger wraff at igbmc.fr
Wed Jul 22 19:56:43 CEST 2009


Patrick,

thank you very much for your quick and helpful answer !

Yes, using  :
 > align <- pairwiseAlignment(samp1,ref1)
 > indel(subject(align))

I'm about to get what I'm looking for.  Now my question is, which 
commands are (will be) availabel for mining an IRangesList-object.
Most of all I'm interested in what would correspond to getting :

 > indel(subject(align))@elements
 > subject(align)@range at start
 > subject(align)@range at witdth   # in fact, so far I can do without this one

(unless you think the @elements, and  @range won't change in the future ...)
With these elements I manage now to extract the very nucleotides that 
were inserted/deleted.

Wolfgang


Patrick Aboyoun a écrit :
> Wolfgang,
> Below is code that retrieves the indel locations you are looking for. 
> I like your attempts at using indel, insertion, and deletion for 
> PairwiseAlignment objects and I'll add the methods for 
> PairwiseAlignment objects to BioC 2.5 (devel) shortly using the 
> conventions that I specify below.
>
> > suppressMessages(library(Biostrings))
> > ref1 <- DNAString("GGGATACTTCACCAGCTCCCTGGC") # my pattern
> > samp1 <- 
> DNAStringSet(c("GGGATACTACACCAGCTCCCTGGC","GGGATACTTACACCAGCTCCCTGGC","ATACTTCACCAGCTCCCTG")) 
>
> > # 1st has a mutation, 2nd has an insertion, the 3rd is simply 
> shorter ...
> >
> > align <- pairwiseAlignment(samp1,ref1)
> >
> > nindel(align)
> An object of class “InDel”
> Slot "insertion":
> Length WidthSum
> [1,] 0 0
> [2,] 1 1
> [3,] 0 0
>
> Slot "deletion":
> Length WidthSum
> [1,] 0 0
> [2,] 0 0
> [3,] 0 0
>
> > deletions <- indel(pattern(align))
> > deletions
> CompressedIRangesList: 3 elements
> > insertions <- indel(subject(align))
> > insertions
> CompressedIRangesList: 3 elements
> > insertions[[2]]
> IRanges instance:
> start end width
> [1] 10 10 1
> > sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
> i386-apple-darwin9.7.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.13.26 IRanges_1.3.41
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.4
>
>
> Wolfgang Raffelsberger wrote:
>> Dear list,
>>
>> previously I've been extracting indel-information from sequences 
>> aligned by the Biostrings function pairwiseAlignment(), which is 
>> probably not the best way since the class 
>> 'PairwiseAlignedFixedSubject' has evoled & changed and my old code 
>> won't work any more. Now trying to use the library-provided functions 
>> to access the information/details about indels (ie their localization 
>> on the pattern and possibly the indel sequence ). However, I can't 
>> find a function to extract this information, that is (to the best of 
>> my knowledge) part of the aligned object.
>>
>> ## here an example :
>> library(Biostrings)
>> ref1 <- DNAString("GGGATACTTCACCAGCTCCCTGGC") # my pattern
>> samp1 <- 
>> DNAStringSet(c("GGGATACTACACCAGCTCCCTGGC","GGGATACTTACACCAGCTCCCTGGC","ATACTTCACCAGCTCCCTG")) 
>>
>> # 1st has a mutation, 2nd has an insertion, the 3rd is simply shorter 
>> ...
>>
>> align <- pairwiseAlignment(samp1,ref1)
>>
>> nindel(align) # insertion was found properly but I can't see at which 
>> nt position the indel was found (neither if it's an insertion or 
>> deletion)
>> indel(align) # Error in function (classes, fdef, mtable) unable to 
>> find an inherited method for function...
>> insertion(align) # Error in function (classes, fdef, mtable) unable 
>> to find an inherited method for function ...
>> deletion(align) # neither ...
>> ?AlignedXStringSet # says under 'Accessor methods' that indel() 
>> exists ..
>>
>> ## ideally I'd be looking for something like
>> mismatchTable(align) # but addressing indels ...
>>
>>
>> ## for completeness :
>> > sessionInfo()
>> R version 2.9.1 (2009-06-26)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 
>>
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.7 
>> IRanges_1.2.3
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
>>
>> Thank's in advance,
>> Wolfgang Raffelsberger
>>
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> Wolfgang Raffelsberger, PhD
>> Laboratoire de BioInformatique et Génomique Intégratives
>> CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, 
>> France
>> Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
>> wolfgang.raffelsberger (at) igbmc.fr
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>


-- 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
CNRS UMR7104, IGBMC,  
1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
http://www-bio3d-igbmc.u-strasbg.fr/~wraff
wolfgang.raffelsberger at igbmc.fr



More information about the Bioc-sig-sequencing mailing list