[BioC] obtain DNA sequence
Biddie, Simon (NIH/NCI) [F]
biddies at mail.nih.gov
Tue Sep 1 20:31:17 CEST 2009
Hi Patrick,
Thanks for your response. I will look into IRanges and Xstring.
I also tried your code, however it gives me the following error:
> mymat
Chr Start Stop
1 chr9 79466420 79466570
2 chr6 50495860 50496010
3 chr8 19687900 19688050
4 chrX 90313740 90313890
5 chr4 117732780 117732930
6 chr11 4090400 4090550
> uniqueChr <- unique(mymat[,"Chr"])
> extractedDNA <- character(nrow(mymat))
> for (chr in uniqueChr) {
+ selected <- which(mymat[,"Chr"] == chr)
+ extractedDNA[selected] <- as.character(Views(Mmusculus[[chr]],
+ mymat[selected,"Start"], mymat[selected,"End"]))
+ }
Error in newViews(subject, start = start, end = end, names = names, Class = "XStringViews") :
'start' and 'end' must be numeric vectors
In addition: Warning message:
In Views(Mmusculus[[chr]], mymat[selected, "Start"], mymat[selected, :
masks were dropped
Simon
-----Original Message-----
From: Patrick Aboyoun [mailto:paboyoun at fhcrc.org]
Sent: Tuesday, September 01, 2009 2:21 PM
To: Biddie, Simon (NIH/NCI) [F]
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] obtain DNA sequence
Simon,
Below is code that meets the needs of your explicit question
mymat <- <<the matrix you have below>>
uniqueChr <- unique(mymat[,"Chr"])
extractedDNA <- character(nrow(mymat))
for (chr in uniqueChr) {
selected <- which(mymat[,"Chr"] == chr)
extractedDNA[selected] <- as.character(Views(Mmusculus[[chr]],
mymat[selected,"Start"], mymat[selected,"End"]))
}
The question I have for you is have you tried using the IRanges
framework to represent your ranges? It would make this type of
processing easier to perform. There is also write functions such as
write.XStringSet and write.XStringViews that provide export
functionality without requiring you to coerce the DNA sequences into
character vectors.
Patrick
Biddie, Simon (NIH/NCI) [F] wrote:
> Dear All,
>
> I am trying to obtain DNA sequences (mouse) from chromosome coordinates. I am relatively new with R and Bioconductor and would appreciate any help.
>
> I have the following style matrix:
>
> Chr Start Stop
> 1 chr9 79466420 79466570
> 2 chr6 50495860 50496010
> 3 chr8 19687900 19688050
> 4 chrX 90313740 90313890
> 5 chr4 117732780 117732930
> 6 chr11 4090400 4090550
>
> I can use the following code to obtain a single sequence by typing in the chromosome number, start and stop manually:
>
>
>> library(BSgenome.Mmusculus.UCSC.mm9)
>>
>
>
>> seq1 = subseq(Mmusculus$chr9,79466420,79466570)
>>
>
>
>> as(seq1, "character")
>>
>
> How would I do this for all the rows in a matrix to be output as a single txt or csv file? ... without having to type each row (I have up to 15,000!) one at a time. Please find below the sessionInfo.
>
> Thank you for any help,
>
> Simon
>
>
>> sessionInfo()
>>
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.11 BSgenome_1.10.5
> [3] Biostrings_2.10.22 IRanges_1.0.16
> [5] R.utils_1.1.3 R.oo_1.4.6
> [7] R.methodsS3_1.0.3
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.1 lattice_0.17-25 Matrix_0.999375-23
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list