[BioC] rtracklayer import.bed pipe inconsistency
Nathan Sheffield
nathan.sheffield at duke.edu
Tue Aug 30 11:08:40 CEST 2011
Thanks, I figured it might just come down to updating.
And yes, I did mean nonstandard columns -- even though it's not "really"
a BED file, it's still helpful to be able to import just the first 3
columns so that my script can handle any type of BED-like file, standard
or not.
The ability to select columns will be helpful in the future, thanks.
-Nathan
On 08/30/2011 01:05 AM, Michael Lawrence wrote:
> And also, just btw,
>
> What do you mean by a BED file more than three columns? rtracklayer can
> read those in just fine, unless they are non-standard columns, in which
> case you really don't have a BED file anyway.
>
> With newer versions of rtracklayer, one can specify the colnames
> argument to select only the desired BED columns. Passing character()
> would give you your desired result.
>
> Michael
>
> On Mon, Aug 29, 2011 at 4:02 PM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
>
> I can't reproduce this:
>
> > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed")
>
> RangedData with 4 rows and 0 value columns across 3 spaces
> space ranges |
> <factor> <IRanges> |
>
> 1 chr1 [108503809, 108508915] |
> 2 chr17 [ 60212870, 60218774] |
> 3 chr8 [ 86373507, 86380637] |
> 4 chr8 [ 99303547, 99307608] |
> > sessionInfo()
> R version 2.14.0 Under development (unstable) (--)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
> [1] C
>
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21
> [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6
>
>
> I don't remember this being an issue in the past, but who knows. My
> only recommendation is to upgrade your R and rtracklayer.
>
> Michael
>
>
> On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield
> <nathan.sheffield at duke.edu <mailto:nathan.sheffield at duke.edu>> wrote:
>
> Hi,
>
> I am having trouble with importing a bed file after running it
> through pipe() in R. Maybe it's a bug in rtracklayer's
> import.bed ? Or maybe I'm missing a setting, can anyone help
> with this?
>
> I have a bed file ("code25.bed") with 4 lines:
> chr17 60212869 60218774
> chr1 108503808 108508915
> chr8 86373506 86380637
> chr8 99303546 99307608
>
> I can read it into R with read.table like so:
>
> read.table("Aug5/codeBed/__code25.bed")
>
> V1 V2 V3
> 1 chr17 60212869 60218774
> 2 chr1 108503808 108508915
> 3 chr8 86373506 86380637
> 4 chr8 99303546 99307608
>
> I want to use rtracklayer to import to get a genomicRanges
> object, so I try with import.bed, which also works:
>
> import.bed("Aug5/codeBed/__code25.bed")
>
> RangedData with 4 rows and 0 value columns across 3 spaces
> space ranges |
> <character> <IRanges> |
> 1 chr1 [108503809, 108508915] |
> 2 chr17 [ 60212870, 60218774] |
> 3 chr8 [ 86373507, 86380637] |
> 4 chr8 [ 99303547, 99307608] |
>
> Now, I want this to work on bed files with more than 3 columns,
> just in case. I can do this with a commandline pipe using cut
> like so:
>
> read.table(pipe(paste("cut -f1,2,3 ",
> "Aug5/codeBed/code25.bed")))
>
> V1 V2 V3
> 1 chr17 60212869 60218774
> 2 chr1 108503808 108508915
> 3 chr8 86373506 86380637
> 4 chr8 99303546 99307608
>
> So this gives the exact same output as the first read.table
> above. However, when I try to pass this pipe to import.bed,
> something strange happens:
>
> import.bed(pipe(paste("cut -f1,2,3 ",
> "Aug5/codeBed/code25.bed")))
>
> RangedData with 5 rows and 0 value columns across 3 spaces
> space ranges |
> <character> <IRanges> |
> 1 chr1 [108503809, 108508915] |
> 2 chr17 [ 60212870, 60218774] |
> 3 chr17 [ 60212870, 60218774] |
> 4 chr8 [ 86373507, 86380637] |
> 5 chr8 [ 99303547, 99307608] |
>
> Not sure why, but it has duplicated one of the regions and now
> has 5, instead of 4. This is a problem with import.bed combined
> with pipe, and has nothing to do with cut:
>
> import.bed(pipe("cat Aug5/codeBed/code25.bed"))
>
> RangedData with 5 rows and 0 value columns across 3 spaces
> space ranges |
> <character> <IRanges> |
> 1 chr1 [108503809, 108508915] |
> 2 chr17 [ 60212870, 60218774] |
> 3 chr17 [ 60212870, 60218774] |
> 4 chr8 [ 86373507, 86380637] |
> 5 chr8 [ 99303547, 99307608] |
>
>
> any ideas?
>
> -Nathan Sheffield
> Duke University, Computational Biology Program
>
> sessionInfo follows:
>
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1
> [4] GenomicRanges_1.2.1 IRanges_1.8.7
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
More information about the Bioconductor
mailing list