[BioC] rtracklayer import.bed pipe inconsistency
Nathan Sheffield
nathan.sheffield at duke.edu
Mon Aug 29 17:56:14 CEST 2011
Hi,
I am having trouble with importing a bed file after running it through
pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm
missing a setting, can anyone help with this?
I have a bed file ("code25.bed") with 4 lines:
chr17 60212869 60218774
chr1 108503808 108508915
chr8 86373506 86380637
chr8 99303546 99307608
I can read it into R with read.table like so:
>read.table("Aug5/codeBed/code25.bed")
V1 V2 V3
1 chr17 60212869 60218774
2 chr1 108503808 108508915
3 chr8 86373506 86380637
4 chr8 99303546 99307608
I want to use rtracklayer to import to get a genomicRanges object, so I
try with import.bed, which also works:
>import.bed("Aug5/codeBed/code25.bed")
RangedData with 4 rows and 0 value columns across 3 spaces
space ranges |
<character> <IRanges> |
1 chr1 [108503809, 108508915] |
2 chr17 [ 60212870, 60218774] |
3 chr8 [ 86373507, 86380637] |
4 chr8 [ 99303547, 99307608] |
Now, I want this to work on bed files with more than 3 columns, just in
case. I can do this with a commandline pipe using cut like so:
>read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed")))
V1 V2 V3
1 chr17 60212869 60218774
2 chr1 108503808 108508915
3 chr8 86373506 86380637
4 chr8 99303546 99307608
So this gives the exact same output as the first read.table above.
However, when I try to pass this pipe to import.bed, something strange
happens:
>import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed")))
RangedData with 5 rows and 0 value columns across 3 spaces
space ranges |
<character> <IRanges> |
1 chr1 [108503809, 108508915] |
2 chr17 [ 60212870, 60218774] |
3 chr17 [ 60212870, 60218774] |
4 chr8 [ 86373507, 86380637] |
5 chr8 [ 99303547, 99307608] |
Not sure why, but it has duplicated one of the regions and now has 5,
instead of 4. This is a problem with import.bed combined with pipe, and
has nothing to do with cut:
> import.bed(pipe("cat Aug5/codeBed/code25.bed"))
RangedData with 5 rows and 0 value columns across 3 spaces
space ranges |
<character> <IRanges> |
1 chr1 [108503809, 108508915] |
2 chr17 [ 60212870, 60218774] |
3 chr17 [ 60212870, 60218774] |
4 chr8 [ 86373507, 86380637] |
5 chr8 [ 99303547, 99307608] |
any ideas?
-Nathan Sheffield
Duke University, Computational Biology Program
sessionInfo follows:
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=no_NO.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1
[4] GenomicRanges_1.2.1 IRanges_1.8.7
loaded via a namespace (and not attached):
[1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0
More information about the Bioconductor
mailing list