[Bioc-sig-seq] extract id from ShortRead

Sean Davis seandavi at gmail.com
Mon Nov 30 15:40:52 CET 2009


On Mon, Nov 30, 2009 at 9:27 AM, Ramzi TEMANNI <ramzi.temanni at gmail.com> wrote:
> Hi,
> I have a sequence loaded from bowtie alignment
> aln <- readAligned("./S1", pattern="S1_1.hg19.bowtie.align", type="Bowtie")
> I would like to to extract the id to select specific reads
> I run id(aln) and I get:
> id(aln)
>  A BStringSet instance of length 4340867
>          width seq
>      [1]    28 HWI-EA332_8_1_3_659#GGGGNN/1
>      [2]    29 HWI-EA332_8_1_3_1738#CCCCNN/1
>      [3]    29 HWI-EA332_8_1_3_1094#AGGANN/1
>      [4]    28 HWI-EA332_8_1_3_558#TTTCNN/1
>      [5]    29 HWI-EA332_8_1_3_1920#AAAANN/1
>      [6]    28 HWI-EA332_8_1_3_228#GGGGNN/1
>      [7]    29 HWI-EA332_8_1_3_1261#AGGGNN/1
>      [8]    28 HWI-EA332_8_1_3_908#ACTTNN/1
>      [9]    27 HWI-EA332_8_1_3_53#CTGCNN/1
>      ...   ... ...
> [4340859]    33 HWI-EA332_8_120_1596_499#TTGANA/1
> [4340860]    34 HWI-EA332_8_120_1599_1161#CCACNT/1
> [4340861]    33 HWI-EA332_8_120_1601_255#CTCTNA/1
> [4340862]    33 HWI-EA332_8_120_1601_504#CCATNC/1
> [4340863]    33 HWI-EA332_8_120_1624_899#CTCTNT/1
> [4340864]    33 HWI-EA332_8_120_1487_658#ACCCNA/1
> [4340865]    32 HWI-EA332_8_120_1533_28#CACANG/1
> [4340866]    33 HWI-EA332_8_120_1564_807#CCCGNG/1
> [4340867]    34 HWI-EA332_8_120_1474_1350#CCTGNC/1
>
> This BStringSet instance has 'width' and 'seq'
> runing str(id(aln)) i got this
>
> Formal class 'BStringSet' [package "Biostrings"] with 5 slots
>  ..@ pool           :Formal class 'SharedRaw_Pool' [package "IRanges"] with
> 2 slots
>  .. .. ..@ xp_list                    :List of 1
>  .. .. .. ..$ :<externalptr>
>  .. .. ..@ .link_to_cached_object_list:List of 1
>  .. .. .. ..$ :<environment: 0x2af6400>
>  ..@ ranges         :Formal class 'GroupedIRanges' [package "IRanges"] with
> 7 slots
>  .. .. ..@ group          : int [1:4340867] 1 1 1 1 1 1 1 1 1 1 ...
>  .. .. ..@ start          : int [1:4340867] 1 29 58 87 115 144 172 201 229
> 256 ...
>  .. .. ..@ width          : int [1:4340867] 28 29 29 28 29 28 29 28 27 29
> ...
>  .. .. ..@ NAMES          : NULL
>  .. .. ..@ elementMetadata: NULL
>  .. .. ..@ elementType    : chr "integer"
>  .. .. ..@ metadata       : list()
>  ..@ elementMetadata: NULL
>  ..@ elementType    : chr "BString"
>  ..@ metadata       : list()
>
> But i'm wondering how to extract only the 'seq' from all that and store
> result in a table ?

as.character(id(aln))

will return a character vector of the names.  You might want to look
at the help for AlignedRead-class and BStringSet-class for some help
in understanding these classes and what can be done with them.  It may
be that you will not need to go to character vector to do what you
want with the reads.

Sean



More information about the Bioc-sig-sequencing mailing list