[Bioc-devel] custom genome
Eric P
pedereri at gmail.com
Sun May 10 10:20:37 CEST 2015
Hi
Ok sorry for misusing the email list. It is just that I looked through the
forum and there were some questions about this that weren't answered very
well. Basically they just said to look at the documentation for BSforge.
And to be honest I am not an R user, so I don't really understand the
process, so no I did not follow the process as I didn't understand the
documentations account of readying a genome for bioconductor. The main
reason that I want to use bioconductor is for the edgeR and cummerbund
functions as well as some graphical representations since I can do the rest
of the analysis on my linux machine, but I thought that I could also try
the whole pipeline since it seems pretty streamlined. But I can post this
on the forum if Martin's code doesn't work out. Thanks for all the help and
again sorry for the mispost.
Cheers,
Eric
On 9 May 2015 at 19:43, Martin Morgan <mtmorgan at fredhutch.org> wrote:
> On 05/09/2015 08:36 AM, Vincent Carey wrote:
>
>> did you put this request on the forum? did you follow the steps with
>> BSforge and find
>> an instruction that could not be followed or led to an error? if you did,
>> please put an
>> account of that event on the support.bioconductor.org and you will get
>> assistance. read
>> the posting guide and provide sessionInfo() so that your errors can be
>> reproduced.
>>
>
> Vince is right that the question should probably be asked on the support
> forum, but for what it's worth I downloaded the file(?) to my current
> working directory
>
> url = "ftp://ftp.plantgdb.org/pub/Genomes/PpGDB/Ppatens_152.fa.gz"
> fl = basename(url)
> download.file(url, fl, mode="wb")
>
> It's small enough that it can be read in to memory.
>
> library(Biostrings)
> dna = readDNAStringSet(fl)
>
> but it would be convenient to have quick random access to to the
> sequences, so I exported it to "2bit' format
>
> library(BSgenome); library(rtracklayer)
> twobit = TwoBitFile(sub(".gz", ".2bit", fl)))
> export(dna, twobit)
>
> and then I have ready access to the sequences, e.g.,
>
> > seqinfo(twobit)
> Seqinfo object with 2106 sequences from an unspecified genome:
> seqnames seqlengths isCircular genome
> scaffold_1 5387533 <NA> <NA>
> scaffold_2 4973254 <NA> <NA>
> scaffold_3 4236991 <NA> <NA>
> scaffold_4 4046325 <NA> <NA>
> scaffold_5 3779828 <NA> <NA>
> ... ... ... ...
> scaffold_11864 1003 <NA> <NA>
> scaffold_11874 1002 <NA> <NA>
> scaffold_11881 1000 <NA> <NA>
> scaffold_11890 1000 <NA> <NA>
> scaffold_11893 1000 <NA> <NA>
> > query = GRanges("scaffold_109", IRanges(seq(1, 1000, by=100), width=50))
> > getSeq(twobit, query)
> A DNAStringSet instance of length 10
> width seq
> [1] 50 GTTACAGATTGGTAAGATCTTGTAGAGGCTCTTTTACTCTGGGCATATAT
> [2] 50 AAAAAAATTGATAGTATAAGAGAAGGAAATTCTTTCAAGAGACAAATTTG
> [3] 50 TTTTTACAATATTGGAAAAAAGAAAAAAAAAAAAAAAAAATGAGAAAAGA
> [4] 50 CATATTTAAAGGGAGTAATTGAAAAACATGTAGTAAATGCTAAGATAGAA
> [5] 50 AATATTTGATATATAATTCACTATATAATATAGTGAGAGTTATTTTATTA
> [6] 50 GTTTAAAATGGCCATTGTTGAAGTTGATTTTAGGTTGATAATAGTTTTTA
> [7] 50 TTATTTCTTTTATTTGATTTTTAGAACATATATTATATAAATCTTATAAA
> [8] 50 ATATGGATGAATGGAGATGGATGATGAAGTTCTCCATTGCTATTTCCAAA
> [9] 50 TATGATCACTCTTTCAGGGCATTTAAACTTAAGAGACTAAGATCAAGAAG
> [10] 50 CTTTATAAAAAAAAATATAGGCATGCTATTGATGCAAAGATATATCAGGA
> > query$seq = getSeq(twobit, query)
>
> Hope that's helpful; I might have made different decisions with a larger
> genome or on Windows, so if you re-post your question on the support forum
> (which I hope you do) it would be helpful to include that information as
> well as your intended use.
>
> Martin
>
>
>
>> On Fri, May 8, 2015 at 7:34 PM, Eric P <pedereri at gmail.com> wrote:
>>
>> Hi
>>>
>>> I have 2-3 genomes that are not listed that I would like to use with
>>> bioconductor and I do not really understand how to prepare one properly.
>>> I
>>> know you have to use BSforge, but after that I need some help. One of the
>>> genomes is *Physcomitrella patens*, which is actually being updated at
>>> the
>>> moment, so I am waiting for version 3.1 to come out, but if someone has
>>> time they could add version 1.6 or 3.0.
>>> However, the other two genomes I have to do myself as they will not be
>>> published for a while so I could just as well try a third. Any help would
>>> be appreciated, thanks. Also, if this was already posted I apologize but
>>> I
>>> could not find enough information on this subject when I searched the
>>> forums.
>>>
>>> --
>>> Cheers,
>>> Eric
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list