[BioC] Biostrings::matchPattern,extract sequences
Hervé Pagès
hpages at fhcrc.org
Wed Sep 18 00:52:08 CEST 2013
Hi Julie,
Sorry for the late answer.
On 09/11/2013 02:43 PM, Zhu, Lihua (Julie) wrote:
> Herve,
>
> Is there a more elegant way to get all matched reference sequences
> besides using subject(matches)[start:end], e.g, subject(matches)[3010894
> : 3010916] for each matched record? Thanks!
> 23-letter "DNAString" instance
> seq: GCGGAGCCTGAGGCAGAAACCTC
>
> matches is the object returned by matchPattern function call.
>
> matches
> Views on a 197195432-letter DNAString subject
> subject:
> NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...CCTATTCTAGTAAAAATTTTATTTCATTCTGTAAAGAATTTGGTATTAAACTTAAAACTGGAATTC
> views:
> start end width
> [1] 3010894 3010916 23 [GCGGAGCCTGAGGCAGAAACCTC]
> [2] 3299593 3299615 23 [GCTGTGGCTGAGATGAATACTGG]
> [3] 3637189 3637211 23 [CCTGCTTCTGCCTCTGCCACCGG]
> [4] 3660740 3660762 23 [GCTGTTGCTGCCGCTGTTGGTGG]
> [5] 3661169 3661191 23 [GCTGCCCCGGCCGCCGCCGCCCG]
> [6] 3661721 3661743 23 [CCCGCGGCTGCAGCACGAGCCGC]
> ....
Just turn this into a DNAStringSet object with a coercion:
as(matches, "DNAStringSet")
or by calling the DNAStringSet() constructor on it:
DNAStringSet(matches)
Cheers,
H.
>
> Best regards,
>
> Julie
>
> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] grid parallel stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 BiocInstaller_1.10.3
> REDseq_1.6.0
> [4] ChIPpeakAnno_2.8.0 GenomicFeatures_1.12.3
> limma_3.16.7
> [7] org.Hs.eg.db_2.9.0 GO.db_2.9.0
> RSQLite_0.11.4
> [10] DBI_0.2-7 AnnotationDbi_1.22.6
> BSgenome.Ecoli.NCBI.20080805_1.3.17
> [13] biomaRt_2.16.0 VennDiagram_1.6.5
> multtest_2.16.0
> [16] Biobase_2.20.1
> BSgenome.Celegans.UCSC.ce2_1.3.19 BSgenome_1.28.0
> [19] ShortRead_1.18.0 latticeExtra_0.6-26
> RColorBrewer_1.0-5
> [22] Rsamtools_1.12.4 lattice_0.20-23
> Biostrings_2.28.0
> [25] GenomicRanges_1.12.5 IRanges_1.18.3
> BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6 hwriter_1.3 MASS_7.3-29
> RCurl_1.95-4.1 rtracklayer_1.20.4 splines_3.0.1
> stats4_3.0.1
> [8] survival_2.37-4 tools_3.0.1 XML_3.95-0.2
> zlibbioc_1.6.0
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list