[BioC] BSgenomeForge seed file - seqnames field
Kelly V [guest]
guest at bioconductor.org
Wed May 1 00:52:03 CEST 2013
I'm preparing a custom reference genome for use with the MEDIPS package. I see that one field of the seed file, which is apparently not optional, is the 'seqnames' field. The example given in the documentation is this:
paste("chr", c(1:20, "X", "M", "Un", paste(c(1:20, "X", "Un"), "_random",
sep="")), sep="")
I have two simple questions about this.
1. Does R match this information with the source sequence file? For example, if I have a single fasta file with fasta headers chr_01...chr_20, must the seqnames entries exactly match those headers?
2. Revealing the reason for my first question:In my genome fasta file, I have 1427 extrachromosomal scaffolds, but they are not all sequentially numbered, so that I have scaffold_1..scaffold_3681. Do I need to use a regular expression in my seqnames field to tell R to look for scaffold_ followed by 1-4 digits?
Thanks for any help,
--Kelly V.
-- output of sessionInfo():
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list