[BioC] question about forgeBSgenomeDataPkg function
Hervé Pagès
hpages at fhcrc.org
Mon Mar 8 22:50:03 CET 2010
Hi Brian,
I'm putting this on the mailing list since this might actually affect
other users.
Brian Herb wrote:
> Herve-
>
> You perviously helped me with building the BSgenome package for the Rat,
> and now i am helping my lab mate create a BSgenome package for the
> rhesus monkey. We are running into an error when he reads in the gap files:
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> scan() expected 'an integer', got 'fragment'
>
> we wonder if the issue is that the gap files are in a slightly different
> format than what I am used to with the rat:
>
> Example Rat gap file:
>
> 585 chr1 1360 2576 2 N 1216 fragment yes
> 585 chr1 5378 5428 4 N 50 fragment yes
> 585 chr1 13845 13895 6 N 50 fragment yes
> 585 chr1 23435 23485 8 N 50 fragment yes
> 585 chr1 25955 26005 10 N 50 fragment yes
> 585 chr1 33306 33356 12 N 50 fragment yes
> 585 chr1 35384 40627 14 N 5243 fragment yes
> 585 chr1 45904 46169 16 N 265 fragment yes
>
> Example rhesus monkey gap file:
>
>
> chr1 17248 17350 2 N 102 fragment yes
> chr1 26206 26619 4 N 413 fragment yes
> chr1 27937 28130 6 N 193 fragment yes
> chr1 47170 48593 8 N 1423 fragment yes
> chr1 83907 85189 10 N 1282 fragment yes
> chr1 95455 96505 12 N 1050 fragment yes
> chr1 100303 100323 14 N 20 fragment yes
> chr1 132263 132283 16 N 20 fragment yes
> chr1 151325 152178 18 N 853 fragment yes
Good catch! It seems that we can't indeed assume that UCSC is using a
consistent schema for their 'gap' table. For Rat and any other organisms
I've seen to far, the columns are the following:
http://genome.ucsc.edu/cgi-bin/hgTables?db=rn4&hgta_group=map&hgta_track=gap&hgta_table=gap&hgta_doSchema=describe+table+schema
but for Rhesus, the 'bin' column is missing:
http://genome.ucsc.edu/cgi-bin/hgTables?db=rheMac2&hgta_group=map&hgta_track=gap&hgta_table=gap&hgta_doSchema=describe+table+schema
I've tried to accommodate this in read.gapMask(). This change will be
available in IRanges >= 1.5.56 (devel) and IRanges >= 1.4.13 (release).
Both packages should become available thru biocLite() in the next 24
hours. Please let me know if you encounter any further issue.
Thanks for the report!
H.
>
> we wonder if the missing column in the monkey gap file is throwing off
> the forgeMasksFiles function, and if there is something that we can
> stipulate in this function to change which column it is looking for.
>
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
> locale:
> [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
> [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
> [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915
> [7] LC_PAPER=en_US.iso885915 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] BSgenome_1.14.2 Biostrings_2.14.10 IRanges_1.4.9
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.0
>
> Kind Regards,
> Brian
>
>
> --
> Brian Herb
> Graduate Program in Biochemistry, Cellular and Molecular Biology
> Johns Hopkins School of Medicine
> Dr. Andrew Feinberg Laboratory
> Rangos 580
> 855 N. Wolfe St.
> Baltimore, MD 21205
> Phone:410-614-3479
> Fax: 410-614-9819
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list