[Bioc-devel] request to create BSgenome Bos_taurus_UMD3.1.1

Hervé Pagès hpages at fredhutch.org
Thu Mar 26 08:54:14 CET 2015


Hi Byungkuk,

On 03/25/2015 10:17 PM, Byungkuk Min wrote:
> Dear Dr.Pages,
>
> I've been trying to build a BSgenome for Bos_taurus_UMD3.1.1 from NCBI
> <http://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.6/>.
> But I'm getting the same error message...
>
>> library(BSgenome)
>> forgeBSgenomeDataPkg("/PATH/TO/genome.fa")
> Error: Line starting '>1 ...' is malformed!
>
> I don't have much knowledge of R, so creating a custom BSgenome seems out
> of my league.
>
> I would like to kindly request the bovine BSgenome package.

Do you really need *that* particular assembly (GCF_000003055.6).
Otherwise, there are already some bovine BSgenome packages available:

   > library(BSgenome)
   > grep("Btaurus", available.genomes(), value=TRUE)
   [1] "BSgenome.Btaurus.UCSC.bosTau3"
   [2] "BSgenome.Btaurus.UCSC.bosTau3.masked"
   [3] "BSgenome.Btaurus.UCSC.bosTau4"
   [4] "BSgenome.Btaurus.UCSC.bosTau4.masked"
   [5] "BSgenome.Btaurus.UCSC.bosTau6"
   [6] "BSgenome.Btaurus.UCSC.bosTau6.masked"

We don't have bosTau8 yet, which is the latest bovine assembly available
at UCSC (they added it in June 2014) but I could add it. Note that
despite its name (also Bos_taurus_UMD_3.1.1), bosTau8 is not the same
assembly as the one you picked up on NCBI. Yours is:

   http://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.6

(the latest, from 2014/11/25), but bosTau8 is:

   http://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.5

(much older, from 2009/12/01)

Anyway, if we ignore chrM and the thousands of scaffolds that are
included in these assemblies, the sequences of the "main" chromosomes
(i.e. chr1 to chr29 + chrX) are exactly the same in the 2 assemblies.

So maybe the BSgenome package for bosTau8 will do?

Note that the main advantage of making a BSgenome package for a
UCSC assembly (instead of using an NCBI assembly) is that the BSgenome
object then interoperates nicely with the TxDb object that one can
easily make from one of the UCSC tracks available for that assembly
(using the makeTxDbFromUCSC() function from the GenomicFeatures
package).

H.

>
> Thank you!
>
>
>
>
> *Byungkuk Min*
> Ph.D Candidate
> University of Science & Technology, <http://ust.ac.kr>
> Korea Research Institute of Bioscience and Biotechnology
> <http://www.kribb.re.kr>
> 125 Gwahangno, Yuseong-Gu, Daejeon, Korea (305-806)
> Laboratory : 042-860-4423
> FAX : 042-860-4608
> Cell phone : 010-5209-7377
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list