[BioC] rtracklayer import.bed pipe inconsistency

Nathan Sheffield nathan.sheffield at duke.edu
Mon Aug 29 17:56:14 CEST 2011


Hi,

I am having trouble with importing a bed file after running it through 
pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm 
missing a setting, can anyone help with this?

I have a bed file ("code25.bed") with 4 lines:
chr17 60212869 60218774
chr1 108503808 108508915
chr8 86373506 86380637
chr8 99303546 99307608

I can read it into R with read.table like so:
>read.table("Aug5/codeBed/code25.bed")
      V1        V2        V3
1 chr17  60212869  60218774
2  chr1 108503808 108508915
3  chr8  86373506  86380637
4  chr8  99303546  99307608

I want to use rtracklayer to import to get a genomicRanges object, so I 
try with import.bed, which also works:
>import.bed("Aug5/codeBed/code25.bed")
RangedData with 4 rows and 0 value columns across 3 spaces
         space                 ranges |
   <character>              <IRanges> |
1        chr1 [108503809, 108508915] |
2       chr17 [ 60212870,  60218774] |
3        chr8 [ 86373507,  86380637] |
4        chr8 [ 99303547,  99307608] |

Now, I want this to work on bed files with more than 3 columns, just in 
case. I can do this with a commandline pipe using cut like so:

>read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed")))
      V1        V2        V3
1 chr17  60212869  60218774
2  chr1 108503808 108508915
3  chr8  86373506  86380637
4  chr8  99303546  99307608

So this gives the exact same output as the first read.table above. 
However, when I try to pass this pipe to import.bed, something strange 
happens:

>import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed")))
RangedData with 5 rows and 0 value columns across 3 spaces
         space                 ranges |
   <character>              <IRanges> |
1        chr1 [108503809, 108508915] |
2       chr17 [ 60212870,  60218774] |
3       chr17 [ 60212870,  60218774] |
4        chr8 [ 86373507,  86380637] |
5        chr8 [ 99303547,  99307608] |

Not sure why, but it has duplicated one of the regions and now has 5, 
instead of 4. This is a problem with import.bed combined with pipe, and 
has nothing to do with cut:

> import.bed(pipe("cat Aug5/codeBed/code25.bed"))
RangedData with 5 rows and 0 value columns across 3 spaces
         space                 ranges |
   <character>              <IRanges> |
1        chr1 [108503809, 108508915] |
2       chr17 [ 60212870,  60218774] |
3       chr17 [ 60212870,  60218774] |
4        chr8 [ 86373507,  86380637] |
5        chr8 [ 99303547,  99307608] |


any ideas?

-Nathan Sheffield
Duke University, Computational Biology Program

sessionInfo follows:

R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=no_NO.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rtracklayer_1.10.6  RCurl_1.4-3         bitops_1.0-4.1
[4] GenomicRanges_1.2.1 IRanges_1.8.7

loaded via a namespace (and not attached):
[1] Biobase_2.10.0    Biostrings_2.18.0 BSgenome_1.18.0   XML_3.2-0



More information about the Bioconductor mailing list