[BioC] trouble reading DNA stringset from keggGet function
James W. MacDonald
jmacdon at uw.edu
Tue Sep 10 20:08:18 CEST 2013
Hi Elliot,
> library("KEGGREST")genes<-keggLink("ath00906")
> genes<-keggLink("ath00906")
> sequences<-keggGet(genes[1:10,2],"ntseq")
> writeXStringSet(sequences, "./tmp.fasta")
> scan("tmp.fasta", "c", nlines=2, sep = "\t") ## check it
Read 2 items
[1] ">ath:AT1G06820 CRTISO; carotenoid isomerase; K09835 prolycopene
isomerase [EC:5.2.1.13] (N)"
[2]
"ATGGATTTGTGTTTTCAAAATCCCGTAAAGTGTGGTGATCGTTTGTTCTCCGCATTGAATACCTCTACGTATTACAAGCT"
Best,
Jim
On Tuesday, September 10, 2013 1:55:20 PM, Elliot [guest] wrote:
>
> I am having some difficulty making fasta files out of files returned by the keggGet function in the KEGGREST package. The file returned is apparently a DNA string set, but readDNAStringSet will not process it. I've tried it with other data and with different kinds of sequences (amino acid) and received the same error message -- I'm sure I must be missing something. My R output is below. Thanks so much for any help!
>
>
>
> -- output of sessionInfo():
>
>> genes<-keggLink("ath00906")
>
>> head(genes)
> [,1] [,2] [,3]
> [1,] "path:ath00906" "ath:AT1G06820" "reverse"
> [2,] "path:ath00906" "ath:AT1G08550" "reverse"
> [3,] "path:ath00906" "ath:AT1G10830" "reverse"
> [4,] "path:ath00906" "ath:AT1G30100" "reverse"
> [5,] "path:ath00906" "ath:AT1G31800" "reverse"
> [6,] "path:ath00906" "ath:AT1G52340" "reverse"
>
>> sequences<-keggGet(genes[1:10,2],"ntseq")
>
>> head(sequences)
> A DNAStringSet instance of length 6
> width seq names
> [1] 1788 ATGGATTTGTGTTTTC...AGGACACTCGCATAG ath:AT1G06820 CRT...
> [2] 1389 ATGGCAGTAGCTACAC...AGGAAGGTCAGGTAG ath:AT1G08550 NPQ...
> [3] 858 ATGGCGGTTTATCATC...ATTGGATTTTTATGA ath:AT1G10830 Z-I...
> [4] 1770 ATGGCTTGTTCTTACA...TTAAACCAGGCTTAA ath:AT1G30100 NCE...
> [5] 1788 ATGGCTATGGCCTTTC...TCTGCTCTTTCTTAA ath:AT1G31800 CYP...
> [6] 858 ATGTCAACGAACACTG...AAAGTCTTCAGATGA ath:AT1G52340 ABA...
>
>> readDNAStringSet(sequences,"fasta")
> Error in .normargInputFilepath(filepath) :
> 'filepath' must be a character vector with no NAs
>
>> class(sequences) #confirm that the input is a DNA string set
> [1] "DNAStringSet"
> attr(,"package")
> [1] "Biostrings"
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list