[Bioc-sig-seq] BED file parser

Ivan Gregoretti ivangreg at gmail.com
Wed Mar 9 15:41:05 CET 2011


Just to expand a little bit Vincent's response.

If you happen to be handling very large BED files, you probably keep
them compressed. The good news is that even in that case, you can load
them:

lit = import("~/lit.bed.gz"."bed")

There is still the long-standing issue of how slow the import()
function is but I am still hopeful.

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878



On Tue, Mar 8, 2011 at 9:26 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
> 2011/3/8 Thiago Yukio Kikuchi Oliveira <stratust at gmail.com>:
>> Hi,
>>
>> Is there a BED file parser for R?
>
> I suppose it depends on what you mean by "parser".  import() from the
> rtracklayer package imports BED and constructs and populates a
> RangedData object with the contents.  Here we look at a small bed file
> in text,
> start R, load rtracklayer, import the data, show the result, and show
> the resources used.
>
> bash-3.2$ head ~/junc716_20.bed
> chr20   55658   64827   JUNC00000001    14      +       55658   64827
>  255,0,0 2       27,25   0,9144
> chr20   55662   64821   JUNC00000002    2       -       55662   64821
>  255,0,0 2       34,8    0,9151
> chr20   135774  147029  JUNC00000003    1       -       135774  147029
>  255,0,0 2       8,29    0,11226
> chr20   167951  172361  JUNC00000004    1       +       167951  172361
>  255,0,0 2       29,8    0,4402
> chr20   189824  192113  JUNC00000005    3       +       189824  192113
>  255,0,0 2       33,9    0,2280
> chr20   189829  192113  JUNC00000006    3       +       189829  192113
>  255,0,0 2       32,9    0,2275
> chr20   193930  199576  JUNC00000007    4       -       193930  199576
>  255,0,0 2       28,11   0,5635
> chr20   207050  207846  JUNC00000008    2       -       207050  207846
>  255,0,0 2       20,34   0,762
> chr20   218306  218925  JUNC00000009    1       -       218306  218925
>  255,0,0 2       11,26   0,593
> chr20   221160  225070  JUNC00000010    25      -       221160  225070
>  255,0,0 2       29,9    0,3901
> bash-3.2$ head ~/junc716_20.bed > ~/lit.bed
> bash-3.2$ R213 --vanilla --quiet
>> library(rtracklayer)
> Loading required package: RCurl
> Loading required package: bitops
>> lit = import("~/lit.bed")
>> lit
> RangedData with 10 rows and 9 value columns across 1 space
>         space           ranges |         name     score      strand thickStart
>   <character>        <IRanges> |  <character> <numeric> <character>  <integer>
> 1        chr20 [ 55659,  64827] | JUNC00000001        14           +      55658
> 2        chr20 [ 55663,  64821] | JUNC00000002         2           -      55662
> 3        chr20 [135775, 147029] | JUNC00000003         1           -     135774
> 4        chr20 [167952, 172361] | JUNC00000004         1           +     167951
> 5        chr20 [189825, 192113] | JUNC00000005         3           +     189824
> 6        chr20 [189830, 192113] | JUNC00000006         3           +     189829
> 7        chr20 [193931, 199576] | JUNC00000007         4           -     193930
> 8        chr20 [207051, 207846] | JUNC00000008         2           -     207050
> 9        chr20 [218307, 218925] | JUNC00000009         1           -     218306
> 10       chr20 [221161, 225070] | JUNC00000010        25           -     221160
>    thickEnd     itemRgb blockCount  blockSizes blockStarts
>   <integer> <character>  <integer> <character> <character>
> 1      64827     #FF0000          2       27,25      0,9144
> 2      64821     #FF0000          2        34,8      0,9151
> 3     147029     #FF0000          2        8,29     0,11226
> 4     172361     #FF0000          2        29,8      0,4402
> 5     192113     #FF0000          2        33,9      0,2280
> 6     192113     #FF0000          2        32,9      0,2275
> 7     199576     #FF0000          2       28,11      0,5635
> 8     207846     #FF0000          2       20,34       0,762
> 9     218925     #FF0000          2       11,26       0,593
> 10    225070     #FF0000          2        29,9      0,3901
>
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2011-03-01 r54628)
> Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rtracklayer_1.11.11 RCurl_1.5-0         bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.19.4      Biobase_2.11.9       Biostrings_2.19.15
> [4] GenomicRanges_1.3.23 IRanges_1.9.25       Matrix_0.999375-47
> [7] XML_3.2-0            grid_2.13.0          lattice_0.19-17
>
>
>>
>>
>> Thanks
>>
>>     /    Thiago Yukio Kikuchi Oliveira
>> (=\
>>   \=) Faculdade de Medicina de Ribeirão Preto
>>    /   Laboratório de Genética Molecular e Bioinformática
>>   /=) -----------------------------------------------------------------
>> (=/   Centro de Terapia Celular/CEPID/FAPESP - Hemocentro de Rib. Preto
>>   /    Rua Tenente Catão Roxo, 2501 CEP 14151-140
>> (=\   Ribeirão Preto - São Paulo
>>   \=) Fone: 55 16 2101-9300   Ramal: 9603
>>    /   E-mail: stratus at lgmb.fmrp.usp.br
>>   /=)            stratust at gmail.com
>> (=/
>>   /    Bioinformatic Team - BiT: http://lgmb.fmrp.usp.br
>> (=\   Hemocentro de Ribeirão Preto: http://pegasus.fmrp.usp.br
>>   \=)
>>    /  -----------------------------------------------------------------
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list