[BioC] ChIPpeakAnno:write2FASTA, getAllPeakSequence Functions
Zhu, Julie
Julie.Zhu at umassmed.edu
Thu Aug 5 15:58:43 CEST 2010
Dear Noah,
Thanks so much for sharing your example and excellent solution!
Yes, if you create your RangedData with names field, then the sequences will
have names associated with, that are used by write2FASTA as sequence
headers. In the future release, headers could be automatically generated if
no head found.
Here is the example to create a RangedData with names field filled in.
peaks = RangedData(IRanges(start=c(100, 500), end=c(300, 600),
names=c("peak1", "peak2")), space=c("NC_008253", "NC_010468"))
library(BSgenome.Ecoli.NCBI.20080805)
seq = getAllPeakSequence(peaks, upstream = 20,
downstream = 20, genome = Ecoli)
write2FASTA(seq)
Thanks so much for the positive feedback!
Kind regards,
Julie
*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program in Gene Function and Expression
Program in Molecular Medicine
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm
On 8/4/10 8:42 PM, "Noah Dowell" <noahd at ucla.edu> wrote:
> Dear Julie,
>
> I came across a little problem today when using the write2FASTA function. The
> function seemed to perform fine in that no error was thrown, but the FASTA
> file that was "written" was empty. So I started to look through the function
> to see if I could find anything and found that the data object created with
> getAllPeakSequence had no rownames. When I manually added rownames (see
> below) the write2FASTA function worked fine. It could be something I messed
> up on my end but, I thought you might want to be aware of this. Feel free to
> post this on the Bioconductor message board if you think it will be helpful,
> but I didn't want to confuse people.
>
>
> Thank you for creating this excellent package.
>
> Kind regards,
>
> Noah
>
>
>
>
>> tRNArdSMALL <- as(temp, "RangedData")
>
>> seqtRNAsmall <- getAllPeakSequence(tRNArdSMALL, upstream = 100, downstream =
>> 100, genome = Scerevisiae)
>
>
>
>> rownames(seqtRNAsmall)
> NULL
>
> # so I did this:
>
>> rownames(seqtRNAsmall) <- seq(1:86)
>
>> write2FASTA(seqtRNAsmall, file ="peakSeqNhp6AtRNAsmall", width = 80)
>
> ##Now the function works since the file contains 86 sequences in the FASTA
> format.
>
> ## The rangedData object I created in the beginning also has no rownames.
>
> ## Here is the head(seqtRNAsmall) before I forced the rownames to be 1:86:
>
>
>
>
>
>
>> head(seqtRNAsmall)
> RangedData with 6 rows and 4 value columns across 16 spaces
> space ranges | upstream downstream
> <character> <IRanges> | <numeric> <numeric>
> 1 chr1 [166355, 166375] | 100 100
> 2 chr10 [354302, 354314] | 100 100
> 3 chr10 [378397, 378437] | 100 100
> 4 chr10 [422758, 422974] | 100 100
> 5 chr10 [540627, 540651] | 100 100
> 6 chr10 [617933, 618064] | 100 100
>
> sequence
>
> <character>
> 1
> TTGTAATGAGTTGGGCACATGGCGCAGTTGGTAGCGCGCTTCCCTTGCAAGGAAGAGGTCATCGGTTCGATTCCGGTT
> GCGTCCAAATTTTTTTGTTAATCCAACACAATTGAATTCGTGAATAGCTGACTGTCATCAGTAATGTTCGTGGAAAGT
> ACCTATCCATACTGTTGTATCACGACTAAGTAGTTGTCGACTACTACCTCCTCAACCCCAGTTAT
> 2
> AATTGTTGGAATTCCATTTTTGATAAAGATGAAAATTGGTGAATTTTGAGATAATTGTTGGGATTCCATTGTTGATAA
> AGGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATCTAGGAATCCTCATAATGGAACCTATAT
> TTCTATATACTAATATTCCGTTTTATTCCTTATTCCGTTTTATATGTTTCATTATCC
> 3
> GCTCACAGGGTCCTGCCGGGTATTCGCCCTCCAGCTTGAAAGGGGTGGCAATGCCCAAGAGCGATGTATAGTTTAGTA
> TCAAGGAATTGAGAGAGGTATTACACCAAGGATTTCCGTGAACGAAGTGACGACGCAAGATTAACAGATTAAATTCTG
> GTAGTAGGCTAAGCCGTTATTAAACTTTATTTACGGTATACCACAATACTGCTCTTTTTGTTGAGGATATGACAGACA
> AAGGCAA
> 4
> CAGTCTCATAATCTGAAGGTCGAGAGTTCGAACCTCCCCTGGAGCACATTTTCTTTTTTTGTGAATTAAACCTCGAAG
> CTCAACATTCATGTTTTGAGATAAAAAAAGGTACTTATTACTTGTGTAAAAGAAAGTTTATAATGATGATCTTATTTA
> AGTTTATAGTAGGACTTTCTATGCATGTCTTGTAGAACTATTTATGCTGGACTCGCCATTAATTAGCTCCAGTGAAAG
> TACATATCTTCTTTCTTGGACATTATTATCTCTAGTAATAGTGATATAAGGATCGGCATTCAACAGGTGATCTGGTAT
> TTGCGGCAATCCAGATTCTGGTCCTGCGTTTTTTTTCAGTTCCATAAACCTTTCAACGCCAATCGGCAGCAAATCAGC
> CGGTAAATGACGTTGTAGATCGCACAT
> 5
> GTGTTAGACGATGACATAAGATACGAGGAACTGTCATCGAAGTTAGAGGAAGCTGAAATGCAAGGATTGATAATGTAA
> TAGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATATAGAGATAAAGATTCCATTTTGAGG
> ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTGATATTATAGCCTTTACCAACAAT
> 6
> TAGCAAAAGTCTTTTCACAATAAGAGGATATTTTGCAGGCAAATTTCTGAAGACCACATAATATCGTCTTCTTGGCGC
> AACTAAGACGCAGAGAAATGCTTGTTTTCAAAAGGGGTATTCACGTAGTACCTAAACTACCAAATTCTAAGGCCCTAC
> TGCAAAATGGAGTGCCTAATATACTCAGTTCTTCGGGATTCAAGACGGTGTGGTTCGACTACCAGCGATATCTATGTG
> ATAAGCTAACGCTGGCGACAGCGGGACAGTCTTTGGAATCATATTATCCATTTCACATATTGCTGAAAACTGCAGGAA
> ATCCATTACAGTCAAACATC
> strand
> <character>
> 1 1
> 2 1
> 3 1
> 4 1
> 5 1
> 6 1
>
>
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Scerevisiae.UCSC.sacCer1_1.3.16 org.Sc.sgd.db_2.4.1
> rtracklayer_1.8.1 RCurl_1.3-1
> [5] bitops_1.0-4.1 ChIPpeakAnno_1.4.0
> limma_3.4.0 org.Hs.eg.db_2.4.1
> [9] GO.db_2.4.1 RSQLite_0.8-4
> DBI_0.2-5 AnnotationDbi_1.10.0
> [13] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0
> GenomicRanges_1.0.1 Biostrings_2.16.0
> [17] IRanges_1.6.0 multtest_2.4.0
> Biobase_2.8.0 biomaRt_2.4.0
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-5 splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1
More information about the Bioconductor
mailing list