[BioC] Biostrings::matchPattern,extract sequences
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Wed Sep 18 01:51:29 CEST 2013
Cool. Thanks, Herve!
Is there a method to extract the mismatch position for all the matches?
Right now, I am using pairwiseAlignment for each matched subsequence.
However, this could become very slow when the number of matched sequences
gets large.
Best regards,
Julie
On 9/17/13 6:52 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
> Hi Julie,
>
> Sorry for the late answer.
>
> On 09/11/2013 02:43 PM, Zhu, Lihua (Julie) wrote:
>> Herve,
>>
>> Is there a more elegant way to get all matched reference sequences
>> besides using subject(matches)[start:end], e.g, subject(matches)[3010894
>> : 3010916] for each matched record? Thanks!
>> 23-letter "DNAString" instance
>> seq: GCGGAGCCTGAGGCAGAAACCTC
>>
>> matches is the object returned by matchPattern function call.
>>
>> matches
>> Views on a 197195432-letter DNAString subject
>> subject:
>> NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...CCTATTCT
>> AGTAAAAATTTTATTTCATTCTGTAAAGAATTTGGTATTAAACTTAAAACTGGAATTC
>> views:
>> start end width
>> [1] 3010894 3010916 23 [GCGGAGCCTGAGGCAGAAACCTC]
>> [2] 3299593 3299615 23 [GCTGTGGCTGAGATGAATACTGG]
>> [3] 3637189 3637211 23 [CCTGCTTCTGCCTCTGCCACCGG]
>> [4] 3660740 3660762 23 [GCTGTTGCTGCCGCTGTTGGTGG]
>> [5] 3661169 3661191 23 [GCTGCCCCGGCCGCCGCCGCCCG]
>> [6] 3661721 3661743 23 [CCCGCGGCTGCAGCACGAGCCGC]
>> ....
>
> Just turn this into a DNAStringSet object with a coercion:
>
> as(matches, "DNAStringSet")
>
> or by calling the DNAStringSet() constructor on it:
>
> DNAStringSet(matches)
>
> Cheers,
> H.
>
>>
>> Best regards,
>>
>> Julie
>>
>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] grid parallel stats graphics grDevices utils datasets
>> methods base
>>
>> other attached packages:
>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 BiocInstaller_1.10.3
>> REDseq_1.6.0
>> [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3
>> limma_3.16.7
>> [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0
>> RSQLite_0.11.4
>> [10] DBI_0.2-7 AnnotationDbi_1.22.6
>> BSgenome.Ecoli.NCBI.20080805_1.3.17
>> [13] biomaRt_2.16.0 VennDiagram_1.6.5
>> multtest_2.16.0
>> [16] Biobase_2.20.1
>> BSgenome.Celegans.UCSC.ce2_1.3.19 BSgenome_1.28.0
>> [19] ShortRead_1.18.0 latticeExtra_0.6-26
>> RColorBrewer_1.0-5
>> [22] Rsamtools_1.12.4 lattice_0.20-23
>> Biostrings_2.28.0
>> [25] GenomicRanges_1.12.5 IRanges_1.18.3
>> BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6 hwriter_1.3 MASS_7.3-29
>> RCurl_1.95-4.1 rtracklayer_1.20.4 splines_3.0.1
>> stats4_3.0.1
>> [8] survival_2.37-4 tools_3.0.1 XML_3.95-0.2
>> zlibbioc_1.6.0
More information about the Bioconductor
mailing list