[Bioc-sig-seq] BED file parser
Ivan Gregoretti
ivangreg at gmail.com
Wed Mar 9 15:41:05 CET 2011
Just to expand a little bit Vincent's response.
If you happen to be handling very large BED files, you probably keep
them compressed. The good news is that even in that case, you can load
them:
lit = import("~/lit.bed.gz"."bed")
There is still the long-standing issue of how slow the import()
function is but I am still hopeful.
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878
On Tue, Mar 8, 2011 at 9:26 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
> 2011/3/8 Thiago Yukio Kikuchi Oliveira <stratust at gmail.com>:
>> Hi,
>>
>> Is there a BED file parser for R?
>
> I suppose it depends on what you mean by "parser". import() from the
> rtracklayer package imports BED and constructs and populates a
> RangedData object with the contents. Here we look at a small bed file
> in text,
> start R, load rtracklayer, import the data, show the result, and show
> the resources used.
>
> bash-3.2$ head ~/junc716_20.bed
> chr20 55658 64827 JUNC00000001 14 + 55658 64827
> 255,0,0 2 27,25 0,9144
> chr20 55662 64821 JUNC00000002 2 - 55662 64821
> 255,0,0 2 34,8 0,9151
> chr20 135774 147029 JUNC00000003 1 - 135774 147029
> 255,0,0 2 8,29 0,11226
> chr20 167951 172361 JUNC00000004 1 + 167951 172361
> 255,0,0 2 29,8 0,4402
> chr20 189824 192113 JUNC00000005 3 + 189824 192113
> 255,0,0 2 33,9 0,2280
> chr20 189829 192113 JUNC00000006 3 + 189829 192113
> 255,0,0 2 32,9 0,2275
> chr20 193930 199576 JUNC00000007 4 - 193930 199576
> 255,0,0 2 28,11 0,5635
> chr20 207050 207846 JUNC00000008 2 - 207050 207846
> 255,0,0 2 20,34 0,762
> chr20 218306 218925 JUNC00000009 1 - 218306 218925
> 255,0,0 2 11,26 0,593
> chr20 221160 225070 JUNC00000010 25 - 221160 225070
> 255,0,0 2 29,9 0,3901
> bash-3.2$ head ~/junc716_20.bed > ~/lit.bed
> bash-3.2$ R213 --vanilla --quiet
>> library(rtracklayer)
> Loading required package: RCurl
> Loading required package: bitops
>> lit = import("~/lit.bed")
>> lit
> RangedData with 10 rows and 9 value columns across 1 space
> space ranges | name score strand thickStart
> <character> <IRanges> | <character> <numeric> <character> <integer>
> 1 chr20 [ 55659, 64827] | JUNC00000001 14 + 55658
> 2 chr20 [ 55663, 64821] | JUNC00000002 2 - 55662
> 3 chr20 [135775, 147029] | JUNC00000003 1 - 135774
> 4 chr20 [167952, 172361] | JUNC00000004 1 + 167951
> 5 chr20 [189825, 192113] | JUNC00000005 3 + 189824
> 6 chr20 [189830, 192113] | JUNC00000006 3 + 189829
> 7 chr20 [193931, 199576] | JUNC00000007 4 - 193930
> 8 chr20 [207051, 207846] | JUNC00000008 2 - 207050
> 9 chr20 [218307, 218925] | JUNC00000009 1 - 218306
> 10 chr20 [221161, 225070] | JUNC00000010 25 - 221160
> thickEnd itemRgb blockCount blockSizes blockStarts
> <integer> <character> <integer> <character> <character>
> 1 64827 #FF0000 2 27,25 0,9144
> 2 64821 #FF0000 2 34,8 0,9151
> 3 147029 #FF0000 2 8,29 0,11226
> 4 172361 #FF0000 2 29,8 0,4402
> 5 192113 #FF0000 2 33,9 0,2280
> 6 192113 #FF0000 2 32,9 0,2275
> 7 199576 #FF0000 2 28,11 0,5635
> 8 207846 #FF0000 2 20,34 0,762
> 9 218925 #FF0000 2 11,26 0,593
> 10 225070 #FF0000 2 29,9 0,3901
>
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2011-03-01 r54628)
> Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.11.11 RCurl_1.5-0 bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.19.4 Biobase_2.11.9 Biostrings_2.19.15
> [4] GenomicRanges_1.3.23 IRanges_1.9.25 Matrix_0.999375-47
> [7] XML_3.2-0 grid_2.13.0 lattice_0.19-17
>
>
>>
>>
>> Thanks
>>
>> / Thiago Yukio Kikuchi Oliveira
>> (=\
>> \=) Faculdade de Medicina de Ribeirão Preto
>> / Laboratório de Genética Molecular e Bioinformática
>> /=) -----------------------------------------------------------------
>> (=/ Centro de Terapia Celular/CEPID/FAPESP - Hemocentro de Rib. Preto
>> / Rua Tenente Catão Roxo, 2501 CEP 14151-140
>> (=\ Ribeirão Preto - São Paulo
>> \=) Fone: 55 16 2101-9300 Ramal: 9603
>> / E-mail: stratus at lgmb.fmrp.usp.br
>> /=) stratust at gmail.com
>> (=/
>> / Bioinformatic Team - BiT: http://lgmb.fmrp.usp.br
>> (=\ Hemocentro de Ribeirão Preto: http://pegasus.fmrp.usp.br
>> \=)
>> / -----------------------------------------------------------------
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list